Lucene, Nutch and Solr

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
This group should probably have more organizers. See documentation on this recommendation.

Lucene is a fabulous indexer, Nutch is a superb web crawler, and Solr can tie them together and offer world class searching. This group discusses the various projects and efforts being made to integrate these technologies with Drupal.

The ApacheSolr module integrates Drupal with the Apache Solr search platform. Solr search can be used as a replacement for core content search and boasts both extra features and better performance. Among the extra features is the ability to have faceted search on facets ranging from content author to taxonomy to arbitrary CCK fields.

Drupal projects that already provide some level of integration with Lucene and/or Nutch:

BlakeLucchesi's picture

SearchAPI Module

The following is my first revision of a proposal to create a search API module. I'd love to get some feedback.

Project Details
A Drupal search API would allow for separation between the search interface that end users interact and the back-end indexing and retrieval work that a search engine performs. The advantages to creating a search API are:

Read more
drunken monkey's picture

Improving the Apache Solr Search Integration module

I am planning to hand in a proposal on improving the Apache Solr Search Integration module.
The project would include:

  • Porting the module to drupal 6 (if necessary)
  • Integration in Views 2, enabling the use of Views 2 as a front-end to display the search results
  • Writing simpletest unit tests for this module, especially for the new functionality

What's your opinion on that? I have already contacted Robert Douglas to ask for his.

Read more

safSDgDFgzfd

dfhgzdfhgzdfhxdfbhzxcbczvbcvbcvcbcvbcvgz

Read more

Building a killer search for Drupal

We've had a good discussion today at Drupalcon, in a BoF session led by Robert Douglass. Here's the plan that emerged to build a killer search for Drupal that will help take us Drupalers further towards world domination. ;)

Read more
robertDouglass's picture

New Solr module available for testing

I've started to write a new module for Solr integration. After finally getting around to testing and trying it extensively, I can say that Solr is one of the coolest things I've seen in the search space. The module that I've written departs from the current Solr project on Drupal.org in that it doesn't conflict with core search, but rather plugs into the core search framework. I need people who know a lot about Solr to look at my work and help me figure some things out:

  • How best to support multiple search indexes?
Read more
sodani's picture

Non Java options?

Lucene and Nutch may sound good if you're familiar with Java, but what if you're not? Since drupal is written in php, are there any php crawlers that drupal might integrate well with? I've tried phpdig but found it to be slow and not well supported.

I've also found rdig, a ruby module but it doesn't seem to have much documentation or support. I'd love to hear other people's opinions on this.

Read more
jvandervort's picture

New Zend Framework 1.0.0 RC

http://framework.zend.com/

Some lucene goodies:

  • Zend_Search_Lucene
    ZF-1262 CaseInsensitive.php missing require_once for the class it extends
    ZF-1263 DocumentWriter.php missing require_once for the class it extends
    ZF-1264 Default.php missing require_once for the class it extends
    ZF-1350 testUtf8(Zend_Search_Lucene_AnalysisTest) failing
    ZF-1351 testUtf8Num(Zend_Search_Lucene_AnalysisTest) failing
    ZF-1365 Zend_Search_Lucene::close() not setting or checking the $_close flag
    ZF-1376 Error in sample code of section 27.3.1.1 - Query Parsing
Read more
Zac1's picture

University assignment

Hello All,

I've got an university assignment to create dynamic URL categorization tool
; the ability to match each website to one of 60 pre-defined categories.
We already got categorized URLs from DMOZ.

And since I am a great Drupal lover ,
I thought i might mix between Drupal , Lucene , Nutch and
some bayesian/SVM AI in order to create this application ?!?

Does anyone familiar with such feature in Lucene or some integration ?
Any comments will be welcome..

Thanks a lot !
Zac.

Read more
jvandervort's picture

CCK fields and custom searching

Any ideas on using the lucene indexing with cck fields, custom searches, and field weighting?

Just curious...

Read more
jvandervort's picture

New Zend Framework 0.9.0

Zend Framework Beta 0.9.0

Zend_Search_Lucene: now matches the performance of Java Lucene

or so they say. We'll see...

  • Zend_Search_Lucene
    ZF-96 Implement Search Highlighting
    ZF-295 Implement score normalisation
    ZF-626 Exception when adding document using static variables
    ZF-693 Using unoptimized indexing database damages storage
    ZF-943 Java examples are no longer necessary.
    ZF-1002 Document deleting/updating problem
    ZF-1050 Result sorting problem
Read more
jvandervort's picture

New Zend Framework 0.8.0

For those keeping up-to-date. I'll be loading it next week.

They are claiming: Great performance improvement for Zend_Search_Lucene.

Read more
robertDouglass's picture

I'll be presenting "search" in Sunnyvale

The OS-CMS (aka Yahoo! Drupalcon) is coming up, and I'll be presenting Drupal's search capabilities. I intend to demo Lucene too, whether it is code from one of the J's or just some demo code I put together for the purpose. I may be calling on you for help, too! Anyway, just wanted to let people know. Come to the conference, it'll be fun!

Read more
robertDouglass's picture

Check out the Solr project!

Whooohooo, these are exciting times for those interested in Lucene =)

http://drupal.org/project/solr

Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface. It runs in a Java servlet container such as Tomcat.

http://lucene.apache.org/solr/

Read more
robertDouglass's picture

Share code. Compare code. Agree on roadmap. Create project.

The title says it all. This is what we need to do now. I have no code, but J & J both do. Please share your code, compare where you are, admire each other's brilliance, and propose a scope and roadmap for the Drupal project.

Read more
jvandervort's picture

Zend-Lucene for Non-Drupal Search

To answer some of Roberts questions and open up discussion, I've implemented a non-Drupal search using the Zend port

Read more
jsloan's picture

Zend Search for Drupal

Although this group is focusing on Drupal integration with external Java Lucene indexers I've been invited to join this group to announce a PHP implementation of the Lucene search for Drupal. So this will be a little different direction for the group.

I've begun work on a 4.7 module that implements Lucene search from the Zend Framework. The framework is still a moving target, limited utf8 support was introduced last week and from the mailing list is this hint that "Full utf-8 support is planned, but will come later."

Read more
evaalguy's picture

Yacy would be great for Drupal

While nutch is a great tool for indexing the web the greatest bottleneck is the cost burdened to individual users. I think a tool such as yacy compliments drupal better as it allows for distributed crawling and indexing among many separately owned servers.

Read more
Subscribe with RSS Syndicate content

Lucene, Nutch and Solr

Group organizers

Group categories

Projects

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: