Search for a large Job portal

Shyamala's picture

Need some clarifications on the best search algorithms to use. I work with a Netlink Technologies. We are currently planning to have a Architect a large Job portal in Drupal. Have convinced our organization that we use Drupal 6.0 and create custom nodes and modules. We are also planning on bench marking the different options of Search that we could adopt.

For Search we are just trying to understand ApacheSolr and Sphinx search.

DO you think we are proceeding in the right direction. Will Drupal - SOLR be a scalable option for a large job portal?

Shyamala
Tech Head
Netlink Technologies
http://www.netlinkindia.com

Comments

Hi Shyamala, I think that

robertDouglass's picture

Hi Shyamala,

I think that both ApacheSolr (http://drupal.org/project/apachesolr) and Sphynx, as well as the newer Xapian integration module (http://drupal.org/project/xapian) are all interesting options.

Sphynx is used by NowPublic.com and they are very happy with its speed. Jeremy Andrews and Trellon think that Xapian is great and it has been benchmarked to be 6x faster than Drupal 5 search. I think that ApacheSolr is the richest in terms of features because it natively supports faceted searching. Faceted searching is great because it guides the user to the result they are seeking. See the example at http://robshouse.net/search/apachesolr_search/drupal The facets are the blocks on the right.

As for scalability, the underlying Solr technology is fast. It is also horizontally scalable so that you can easily add more servers if needed. This is why Netflix uses it: http://www.netflix.com/BrowseSelection

ApacheSolr is being very actively developed at the moment. There is also good momentum behind Xapian, but not so much or Sphynx. Therefore I would choose either ApacheSolr or Xapian because it is always easier to deliver and maintain a good product if the rest of the community is working on it as well.

One thing that you should consider is that search has improved in Drupal 6, and it is worth considering whether you need a third party solution at all. In particular, the combination of Views module (to select an initial subset of content) and a search filter (to do a keyword search within the subset) is very very powerful. This is best done with the core search module. In this case the search is just a filter on the view.

It is possible to combine the approach as well. Use ApacheSolr for general site search and use core search plus views for very targeted searches.

You will want to look at the core searches module, too: http://drupal.org/project/coresearches This is a patch and two modules that moves the core user and content search into their own modules so that you can turn them off if you don't want them. It is very useful in conjunction with ApacheSolr.

Thanks for the extremely

Non-Node content type

LarryEitel's picture

I have been exploring Solr. Have it running on my local XAMPP. Looks very interesting. I do however need to include a custom data type. This type does NOT extend node type. Your module is based on using nodes.

I read a reference elsewhere that you can modify the schema.xml to reflect other fields definitions. I assume that any content type's fields should be represented in this schema.

Aside from amending this file, it SEEMS straightforward that I can reverse engineer your module to add hooks to a non-node content type.

I need to work with one million or better records containing multi-lingual text.

Thank you for your work on this. :)

Lucene, Nutch and Solr

Group organizers

Group categories

Projects

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: