Improving the Apache Solr Search Integration module
Project information
Project page on drupal.org: Post to issue queue; apachesolr project (HEAD is for Views integration development)
Current status: Finished first prototype for Views integration, researching/testing for second; adding small features to the module
Description
The Apache Solr Search Integration module is a very convenient tool to integrate the Apache Solr search facilities into a drupal project. Yet, one of its flaws is that it's way of presenting the search results rather static, missing the customisation facilties that, for example, the Views module provides. This project aims at improving the usability of the apachesolr module by porting it to the latest version of drupal, integrating a way to present its search results with the Views front-end and adding the possibility to index attachments.
The first goal of the project will be to port the apachesolr module (including the apachesolr_search module) to drupal 6 (if not already done). Also part of this step is evaluating what new possibilities drupal 6 provides that are relevant to the apachesolr module and to what extent new functionality can easily be added to the module through the use of drupal 6. Depending on this, new functionality may be added to the module.
Subsequently, the main goal will be to research ways to provide the Solr search with the Views front-end, choose the best one according to usability, performance, to what extent it would aid others in integrating another search engine and other criteria, and finally implement it. Up to now, possible methods include:
- Implementing the Solr search as a Views filter, which would certainly function, but probably not perform very well concerning the mentioned criteria.
- Using just the Views front-end and feeding it with data from the Solr searches. It remains to be evaluated to what extent this would introduce unnecessary dependencies and if it could be made abstract enough to aid others in integrating other search engines.
An additional goal will then be to implement a way of indexing attachments via the module, which is supported by Lucene. This has already been proposed in the apachesolr issue queue and is therefore an agreed-upon target for the future development of the module.
Also part of this project will be the creation of simpletest test cases for the apachesolr module, with emphasis on the new functionality.
Project schedule
- 06/08: Ported the apachesolr and apachesolr_search modules to drupal 6
- 07/06: Have decided on the Views integration method to implement
- 07/20: Finished work on Views integration
- 08/11: Completed implementing the attachment indexing mechanism
Continously: writing tests for written code.
Status updates
2008-05-27
Task summary: Locally installed the necessary environment. Committed an incomplete D6 port to a newly created D6 branch of the apachesolr module and tested what works and what doesn't.
Next week I'll start completing the D6 port of the module and looking into what opportunities D6 holds for it.
Change Log:
- Installed clean versions of Drupal 5 and 6.2
- Added the apachesolr module to the D5 version and an incomplete D6 port to the D6 version
- Added simpletest, views, cck and coder modules to the D6 version
2008-06-03
Task summary: Nearly finished porting the module, with some help. Next week I'll write/update the tests for that and correct what's still buggy. And maybe I'll already start looking deeper into the Views integration problem.
Difficulties encountered:
- My bachelor thesis takes a bit longer than I thought, this might delay the initial schedule slightly.
Change Log:
- Committed a new version of the module's D6 port to the CVS
2008-06-10
Task summary: Completely finished the D6 port (at least, there are no new bug reports) and wrote/updated a bunch of tests. Next week, I'll maybe add some small features and will definitely start to look into the Views integration issue.
Difficulties encountered:
- Because of several events (the mentioned thesis, exams, a festival), I won't have much time this week. But by next Wednesday, everything should be resolved and I should finally be able to completely concentrate on this project.
Change Log:
- Committed a new (hopefully bug free) version of the module's D6 port and several tests.
2008-06-17
Task summary: As notified, I couldn't work very much on the project this week. Almost finished a patch that adds one more feature and briefly looked into the Views integration. Next week I plan to accomplish much more, i.e. finish that patch and maybe some others, maybe also add tests for those and certainly get my hands on the Views integration properly.
Change Log:
- None.
2008-06-24
Task summary: Finished the feature patch and some minor style issues. Worked on the first Views integration prototype, which I'll continue working on next week.
Change Log:
- None. Waiting for feedback before committing the changes mentioned above.
2008-07-01
Task summary: Added some other minor features, finished and published the first Views integration prototype. Researched other ways to do it for the second one, and for ways to improve the first. Next week I'll continue research and start coding on the second prototype. Also will try to improve the first. And I'll set up a public test installation with the first (and maybe the second, if done by then) prototype.
Change Log:
- Committed several changes to the D5 and D6 branches of apachesolr.
- Moved Views integration development to the project's HEAD.



Please commit first, get feedback later :)
I've been noticing a trend where people say things in their status reports like they're waiting on $thingy to commit their code. Please just COMMIT YOUR CODE. :) Ideally, you want to have commits at least daily, sometimes 20 times a day. It's good that you've been filling out status reports, but when we don't see commits happening on projects for 7+ days, we start to get nervous. ;)
Every time you get a little piece done, it should be committed. This allows someone to go back and view the history of how the project evolved, and also allows you to roll back individual changes that might've broken something without having to go back and figure out what all you need to change manually. Let the computers do the hard work! ;) It also lets people look at what you're doing and provide feedback as you're doing it.
I'll try, but it's not so easy
Since I'm working on an existing project, that already works, I can't just commit anything, people who want to just download the module and don't want to help testing would be a bit frustrated to find non-functioning bits in the module.
In a few days we'll make an official release, which will make things a bit easier for commiting, but then there is another problem:
Since I'm creating several prototypes for the main task, I can't just commit that code without adding a branch for each, which would be a bit oversized just for this task.
So, I'll try and commit as often as I can, but I'm afraid that most things will have to stay in patch form most of the time.
And apart from that, this is mostly an R&D project at its core, so the patches and commits don't cover all of the work, too.
Each student should have a project for their SoC project
....as outlined at the initial "Welcome" post at http://groups.drupal.org/node/10879.
It's fine if this project goes away at the end of SoC; many will, including http://drupal.org/project/color_soc08 and http://drupal.org/project/new_aggregator, once they get into Drupal core. But if you don't have commit access to apache_solr, we still need a way to download the same version of the code that you're working on, as well as see your incremental progress.