Improving the Apache Solr Search Integration module

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!

Project information

The project is finished, the final release can be downloaded here.
It contains, besides the old prototype, a complete Solr backend for Views, which can be used by applying the contained patch to the Views module. I also posted this to the Views issue queue in the hope that it might be directly committed to Views, but at the moment no one has yet responded.
Once the patch is applied, Solr views can be easily created by selecting the Apache Solr base table when adding a new view.
This will also be merged into the normal apachesolr D6 branch soon.

Project page on drupal.org: Post to issue queue; apachesolr project (HEAD is for Views integration development); Demo site
Current status: Finished.

Description

The Apache Solr Search Integration module is a very convenient tool to integrate the Apache Solr search facilities into a drupal project. Yet, one of its flaws is that it's way of presenting the search results rather static, missing the customisation facilties that, for example, the Views module provides. This project aims at improving the usability of the apachesolr module by porting it to the latest version of drupal, integrating a way to present its search results with the Views front-end and adding the possibility to index attachments.

The first goal of the project will be to port the apachesolr module (including the apachesolr_search module) to drupal 6 (if not already done). Also part of this step is evaluating what new possibilities drupal 6 provides that are relevant to the apachesolr module and to what extent new functionality can easily be added to the module through the use of drupal 6. Depending on this, new functionality may be added to the module.

Subsequently, the main goal will be to research ways to provide the Solr search with the Views front-end, choose the best one according to usability, performance, to what extent it would aid others in integrating another search engine and other criteria, and finally implement it. Up to now, possible methods include:

  • Implementing the Solr search as a Views filter, which would certainly function, but probably not perform very well concerning the mentioned criteria.
  • Using just the Views front-end and feeding it with data from the Solr searches. It remains to be evaluated to what extent this would introduce unnecessary dependencies and if it could be made abstract enough to aid others in integrating other search engines.

An additional goal will then be to implement a way of indexing attachments via the module, which is supported by Lucene. This has already been proposed in the apachesolr issue queue and is therefore an agreed-upon target for the future development of the module.

Also part of this project will be the creation of simpletest test cases for the apachesolr module, with emphasis on the new functionality.

Project schedule

  • 06/08: Ported the apachesolr and apachesolr_search modules to drupal 6
  • 07/06: Have decided on the Views integration method to implement
  • 07/20: Finished work on Views integration
  • 08/11: Completed implementing the attachment indexing mechanism

Continously: writing tests for written code.

Status updates

2008-05-27

Task summary: Locally installed the necessary environment. Committed an incomplete D6 port to a newly created D6 branch of the apachesolr module and tested what works and what doesn't.
Next week I'll start completing the D6 port of the module and looking into what opportunities D6 holds for it.

Change Log:

  • Installed clean versions of Drupal 5 and 6.2
  • Added the apachesolr module to the D5 version and an incomplete D6 port to the D6 version
  • Added simpletest, views, cck and coder modules to the D6 version

2008-06-03

Task summary: Nearly finished porting the module, with some help. Next week I'll write/update the tests for that and correct what's still buggy. And maybe I'll already start looking deeper into the Views integration problem.

Difficulties encountered:

  • My bachelor thesis takes a bit longer than I thought, this might delay the initial schedule slightly.

Change Log:

  • Committed a new version of the module's D6 port to the CVS

2008-06-10

Task summary: Completely finished the D6 port (at least, there are no new bug reports) and wrote/updated a bunch of tests. Next week, I'll maybe add some small features and will definitely start to look into the Views integration issue.

Difficulties encountered:

  • Because of several events (the mentioned thesis, exams, a festival), I won't have much time this week. But by next Wednesday, everything should be resolved and I should finally be able to completely concentrate on this project.

Change Log:

  • Committed a new (hopefully bug free) version of the module's D6 port and several tests.

2008-06-17

Task summary: As notified, I couldn't work very much on the project this week. Almost finished a patch that adds one more feature and briefly looked into the Views integration. Next week I plan to accomplish much more, i.e. finish that patch and maybe some others, maybe also add tests for those and certainly get my hands on the Views integration properly.

Change Log:

  • None.

2008-06-24

Task summary: Finished the feature patch and some minor style issues. Worked on the first Views integration prototype, which I'll continue working on next week.

Change Log:

  • None. Waiting for feedback before committing the changes mentioned above.

2008-07-01

Task summary: Added some other minor features, finished and published the first Views integration prototype. Researched other ways to do it for the second one, and for ways to improve the first. Next week I'll continue research and start coding on the second prototype. Also will try to improve the first. And I'll set up a public test installation with the first (and maybe the second, if done by then) prototype.

Change Log:

  • Committed several changes to the D5 and D6 branches of apachesolr.
  • Moved Views integration development to the project's HEAD.

2008-07-08

Task summary: Created the required alpha release and a test installation for demo and testing purposes. Improved the first prototype and planned the second.

Change Log:

  • Committed some small changes to the D5 and D6 branches of apachesolr.
  • Created alpha release of D6 version of apachesolr.
  • Created test site and alpha release.

2008-07-15

Task summary: Made several small updates to the module and to the first prototype. E.g., the facet blocks now work with the view, too. Also discussed possible layouts for the second prototype and started implementing.
Next week I'll continue implementing the second prototype and keep working on the module and the first prototype, if the chance arises.

Change Log:

  • Committed some small changes to the D5 and D6 branches and HEAD of apachesolr.

2008-07-22

Task summary: Wrote new query class that could be used by views to display results from Solr instead from the database and on generalizing some module functionality to work with views on arbitrary paths. Also some unrelated work on other parts of the module.
Next week I'll try to patch Views in a way that allows one to use such a query class to select an individual back-end for a base table.

Change Log:

  • Several commits to HEAD, including the new views query class and some generalizations to the module where things only worked for the static "search//" path.

2008-07-29

Task summary: I started with the Views patch but am not quite finished yet. I already wrote the necessary base table definition so that Views knows about it, but up to now it isn't using my specified query class.
Next week I hope to finish this and post the patch.

Change Log:

  • Some commits to HEAD.

2008-08-05

Task summary: I worked on rewriting the Views backend to allow for the definition and usage of alternative query classes. I hoped to get this done by today, but, alas!, I haven't, so I'll continue working on this the next week. At the moment, it already uses the right class and there are no errors, but the results don't get displayed. Also on the positive side, all normal views work as before.

Difficulties encountered:

  • The rewriting of the Views backend is a bit more complicated than I thought. At the moment it seems like I'm almost done and there is only one error left somewhere, but I don't know how quick I'll find it and if it's really the only one.
  • Also I won't have internet for a few days from now on, hence the early report. But I hope to get some work done, still, on my laptop.

Change Log:

  • Some commits to HEAD, including a first draft at the Views patch.

2008-08-12

Task summary: With some help I finally finished the second prototype and posted the resulting Views patch to the Views issue queue for review. I also extended it a bit already, so the argument now works, too.
In this last week I plan to further extend it, work on the Views patch if necessary and also try to get the attachment indexing mechanism implemented by febbraro into the module.

Change Log:

  • Committed a working version of the second prototype to HEAD.

Comments

Please commit first, get feedback later :)

webchick's picture

I've been noticing a trend where people say things in their status reports like they're waiting on $thingy to commit their code. Please just COMMIT YOUR CODE. :) Ideally, you want to have commits at least daily, sometimes 20 times a day. It's good that you've been filling out status reports, but when we don't see commits happening on projects for 7+ days, we start to get nervous. ;)

Every time you get a little piece done, it should be committed. This allows someone to go back and view the history of how the project evolved, and also allows you to roll back individual changes that might've broken something without having to go back and figure out what all you need to change manually. Let the computers do the hard work! ;) It also lets people look at what you're doing and provide feedback as you're doing it.

I'll try, but it's not so easy

drunken monkey's picture

Since I'm working on an existing project, that already works, I can't just commit anything, people who want to just download the module and don't want to help testing would be a bit frustrated to find non-functioning bits in the module.
In a few days we'll make an official release, which will make things a bit easier for commiting, but then there is another problem:

Since I'm creating several prototypes for the main task, I can't just commit that code without adding a branch for each, which would be a bit oversized just for this task.
So, I'll try and commit as often as I can, but I'm afraid that most things will have to stay in patch form most of the time.

And apart from that, this is mostly an R&D project at its core, so the patches and commits don't cover all of the work, too.

webchick's picture

....as outlined at the initial "Welcome" post at http://groups.drupal.org/node/10879.

It's fine if this project goes away at the end of SoC; many will, including http://drupal.org/project/color_soc08 and http://drupal.org/project/new_aggregator, once they get into Drupal core. But if you don't have commit access to apache_solr, we still need a way to download the same version of the code that you're working on, as well as see your incremental progress.

How about Views 1?

jayjaylabz's picture

Hello Thomas! I tried the test installation you have through the link above and found it good. I expected this to be using views 1 but soon realized it uses views 2 instead. I actually very excited with the release for I waited long for this. Your help in any way develop the views 1 counterpart of this module you just released, is highly appreciated. I'm looking forward to it. Thanks...

Let's keep in mind

cwgordon7's picture

Let's keep in mind that in order for SoC projects to have the most effect, they should be done for the latest stable version of Drupal (6.x and Views 2) rather than older versions (5.x and Views 1). I don't think that backporting to 5.x should be considered at all important, if even desirable, for any SoC project.

Yeah I actually anticipated

jayjaylabz's picture

Yeah I actually anticipated you gonna point this out. I believe you are doing it foremost for SoC project and what I ask about is not actually compulsory the fact that you are running with time. Well, I'm not part of SoC 2008 either, and the help I humbly asked is just for the site I maintained along with my company. The site I maintained is still running at 5.x and an upgrade to 6.x is almost impossible and could never get inside our clients' nerves this time considering the hugeness of our site (www.worthpoint.com). That's why I'm asking for anyone who could help me backport the module or help in any way.
Just tell me if I posted at the wrong thread. I think I already have posted at all threads with regards on this topic (that shows my intense need... :( ).
Thanks..

Staying on Drupal 5

jpetso's picture

Offtopic, but you'll need to upgrade to Drupal 6 sooner or later anyways, otherwise you'll get into trouble in the long term because of security support (which ends for Drupal 5.x when 7.0 is released) and of course new developments that you likely want to put into use - like this one, for example. Staying on Drupal 5 for the time being is ok, but I would strongly advise you to rethink (and discuss with your client) the decision to stay on Drupal 5 wholesale. In the long term, it might cost you more than going with the flow and working on the upgrade before you are forced and rushed into it.

Thanks for the piece of advice man!

jayjaylabz's picture

I've been thinking about upgrading as very essential to the site itself. Maybe that would be placed on our cup sooner or later. Thank you so much for the response. I hope I can have this issue up and prepare for the possible upgrade. Further help is still highly appreciated.. :)

At the risk of being much too late...

drunken monkey's picture

As catch pointed out here, Views 1 is simply not flexible enough to support this thing in anything near the current form. It could work still, but you'd essentially have to hack the Views module and fix something up that would be totally different from my project code-wise.

So for this functionality, D6 is really a requirement and I'm afraid I couldn't be of much use hacking something together for Views 1/D5.

Indexing of attachments

febbraro's picture

Hey there, I see that the plan of this project was to integrate file attachments into the Apache Solr Search Integration module. I had a client that needed this ASAP, so I already wrote it. Need to go about getting it more integrated (really I just need some refactorings and a hook or two). Hopefully I can get this integrated soon.

Would be great :D

drunken monkey's picture

Yeah, I read your post to the issue queue, that's really great. :D
Only bad thing is that I won't be around for a few days from now on and therefore can't help if it's needed ASAP. But once I'm back I'll try my best to help you, if it's not already done by then.

SoC 2008

Group categories

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week