Lucene, Nutch and Solr

Lucene is a fabulous indexer, and Nutch is a superb web crawler. Together, they can make the basis of a full-featured search engine. Lucene alone can be used to provide search services, such as those needed by a Drupal site. This group discusses the various projects and efforts being made to integrate these technologies with Drupal.

The ApacheSolr module integrates Drupal with the Apache Solr search platform. Solr search can be used as a replacement for core content search and boasts both extra features and better performance. Among the extra features is the ability to have faceted search on facets ranging from content author to taxonomy to arbitrary CCK fields.

Drupal projects that already provide some level of integration with Lucene and/or Nutch:

Chat Module

private
Sciera - Thu, 2009-07-02 01:26

Hi Guys,

Can anyone recommend me a good chat solution for Drupal 6. I am trying to evaluate different solutions like DimDim, 123FlashChat Server, avchat(avchat.com)..

Has anyone used any of these solutions.. Will U recommend any?

I will be using this for a Learning Management solution. I would like to host online presentations, do whiteboarding, have the ability to control the time the chat sessions are on; All chat sessions will need to be stored and indexed for later retrieval. I will be using the apache solr module. That part seems to work fine.

How to implement apache solr for multiple sites

jatindercheema's picture
private
jatindercheema - Fri, 2009-06-19 05:46

Hi All,

I want to know we can implement single apache solr instance for multiple sites, what all changes required to done...???

Jatinder


RDF for Solr: Possible implementation strategies

drunken monkey@drupal.org's picture
private
drunken monkey@... - Fri, 2009-06-12 17:29

(For information about my project, see here. Put shortly, it's about enabling Solr to index RDF data via drupal.)

Before starting the actual coding, even on prototypes, the basic options for implementing this will have to be discussed. At the moment, my mentors and I see the following three possibilities:


Double click ad server

jatindercheema's picture
private
jatindercheema - Fri, 2009-06-05 06:10

Hi All,

After adding double-click ad server for ads into my site, its page load time increase and performace is slower then earlier...

I want to know how to increase/optimze the page load performace, when i am using double click ad server.

Best,
Jatinder Cheema


Single Apache-Solr for multiple sites

jatindercheema's picture
private
jatindercheema - Mon, 2009-06-01 07:05

Hi All,

I want help in configuring single apache-solr-nighty engine for multiple-sites with different languages.

Please help me in configuring the apache-solr for multiple-sites.

Jatinder


Adding RDF Support to the ApacheSolr module

private

Project information

Project page on drupal.org: ApacheSolr RDF Support
Student: Thomas Seidl (drunken monkey on d.o)
Mentor: Robert Douglass (robertDouglass)
Co-mentor: Stephane Corlosquet (scor)
Local mentor: Wolfgang Ziegler (fago)

Current status: Implementing first prototype

Description

This project will improve the ApacheSolr module by enabling it to handle (i.e., index and search with a comfortable UI) any kind of RDF data. This will instantly make it possible to provide meaningful searches for all site content that isn't node-centric, as well as content from anywhere else on the web. Only an RDF class description and a way to access the data would have to be provided (apart from the normal Solr requirements) and the module would automatically do the rest of the work.
More information can be found in the original discussion.

Anyone working on the nutch module for 6?

pearlbear@drupal.org's picture
private
pearlbear@drupal.org - Fri, 2009-05-01 16:13

Hi there,

Anyone working on getting the nutch module working for Drupal 6? Any folks know of other avenues to get full-text document search (.pdf, .doc, etc.) in Drupal 6?

Thanks!


Multisite Search using ApacheSolr module

private
auxin - Thu, 2009-04-30 16:40

Hi,

Can anyone let me know if it is possible to index and search multiple Drupal and non-drupal websites using the ApacheSolr module?

If not please let me know of any other way that this could be achieved.

Thanks

Problem while implementing Lucene

jatindercheema's picture
public
jatindercheema - Thu, 2009-04-02 13:41

Hi all,

I am facing the problem while implementing the lucene search in my site. Can any body help me out for the problem. Even i added teh Zend Framwork for Lucene search in drupal site but still i am facing the following error while adding the module of search lucene API.

"The required Zend Framework components of Search Lucene API are not installed. (Currently using Zend Framework components not installed) "

Please comment with your solution and with site from where i can get the components of search lucene API.


Solr RDF Support

drunken monkey@drupal.org's picture
public
drunken monkey@... - Thu, 2009-03-26 14:11

Overview

This project is about adding RDF Support to the popular ApacheSolr module in the form of a Solr RDF contrib module. The module should be able to read an RDF class specification and automatically generate the necessary mapping to a Solr server, provide the capability to search resources with that type and also generate facets based on its properties. It would even be possible to build the existing Node search capabilites completely on top of this mechanism! But in any case you could also add arbitrary other types like users or taxonomy terms, or resources from other websites altogether.


Search results display

narsing's picture
public
narsing - Thu, 2009-03-26 06:10

HI,
I am newbie to drupal.I am facing few issues with apachesolr module. I am using apachesolr-6.x-1.0-beta5 module and solr 1.3.

1) When i search for any content, the results are displaying only in links not in teasers.
2) When i search for any title it displays empty results.
3) For me the "Spell check" is not working after checking the checkbox in apachesolr settings page .

Is there any settings that i need to do? Please help me.


the old opensearchclient module

erle@drupal.org's picture
public
erle@drupal.org - Fri, 2009-02-13 01:03

Just wanted to write to get the views of all here on the opensearchclient module.

I've recently been handed the task of porting a hacked version (was done previously, not by me) of the opensearchclient module which is needed for a site I'm upgrading to D6. The hacked version has been ported, and I noticed the original code has been abandoned.


Help backport ApacheSolr D6 to D5

robertDouglass's picture
public
robertDouglass - Thu, 2009-02-05 21:48

Hi Everyone,

Today I took a couple hours and attempted a backport of ApacheSolr D6 to D5. This will bring all of the cool features that D6 has to D5. The patch needs work. It doesn't yet work. If anyone has time to chip in and work on the indexing of nodes (currently not working) that'd be great. With a little momentum I think we can do the backport relatively quickly, after which keeping the two in sync will be much easier.

http://drupal.org/node/337735#comment-1240921


Using Apachesolr module as an API

toemaz@drupal.org's picture
public
toemaz@drupal.org - Wed, 2009-01-07 12:55

Just want to share my experience with using the apachesolr module as an API. I use apachesolr module purely as interface between my own drupal modules and the Solr instance. Part of the reason to do so is that I use a different schema.xml, compared to the one provided with the apachesolr module.

For the single technical detail, read this issue.

Thanks a lot to the developers for creating the apachesolr module and making it nicely extensible!


Running Solr as a service on Debian

toemaz@drupal.org's picture
public
toemaz@drupal.org - Sun, 2009-01-04 19:19

I was looking for a way to run Solr as a service on Debian. Ez Publish CMS has some interesting scripts in their svn repository http://svn.ez.no/svn/extensions/ezfind/ezp4/trunk/extension/ezfind/bin/s...
http://svn.ez.no/svn/extensions/ezfind/ezp4/trunk/extension/ezfind/java/

I used the solr script from the first link together with the solr.sh in the second, followed the instructions in the solr script and it works well.

Somebody another or better solution?


Does the order of fields in the search query matter in Solr?

toemaz@drupal.org's picture
public
toemaz@drupal.org - Mon, 2008-12-22 08:29

Query 1:

field1:value1 field2:value2

Query 1 reversed:

field2:value2 field1:value1

.

Does it make a difference? Would it be better for performance to use the field first which yields the smallest sub result set? Or does Solr/Lucene handle this all by itself?


Search through buddy nodes

toemaz@drupal.org's picture
public
toemaz@drupal.org - Mon, 2008-11-24 19:35

Did anyone ever tried or succeeded to configure drupal+solr to search through nodes of buddies? So integration with buddylist or any other user relationship module. Would it be even possible?


Awesome jQuery + Solr integration

robertDouglass's picture
public
robertDouglass - Wed, 2008-11-05 12:26

Check out this awesome jQuery/Solr library that David Peterson pointed me to: http://solrjs.solrstuff.org/test/reuters/


Searching over multiple (heterogenous) indexes

robertDouglass's picture
public
robertDouglass - Sun, 2008-08-17 08:39

(repost from http://drupal.org/node/296198 by drunkenmoney)

While implementing the attachment indexing mechanism, we (febbraro, robertDouglass and I) stumbled across a problem: how to store the attachment text?
It would be easily possible to just append it to the "text" field or add a new, multi-valued field or both. But then it would be impossible to distinguish the place of occurence of the term at search time, which, unfortunately, is a requirement, since the attachments should appear directly in the search results, not just links to the nodes containing them.


MY SQL configuration

Shyamala@drupal.org's picture
public
Shyamala@drupal.org - Sat, 2008-08-09 06:26

We have a database Server Configuration:
4GB RAM
600GB Hard Disk
Xeon Processor 1.3 Ghz.

We are barely able to have 100 concurrent users!!! What are we doing wrong.

I know I need to configure mysql_query cache, mysql_limit_size and table_cache. But what should be the formula, and how do we go about checking the same.

Below is the details of our my.ini file.

[mysqld]
datadir=/database/data
socket=/var/lib/mysql/mysql.sock
set-variable=max_connections=2000
set-variable = max_allowed_packet=64M
default-storage-engine = innodb
log-bin=/database/data/mysql-bin


Benchmarking in Drupal

Shyamala@drupal.org's picture
public
Shyamala@drupal.org - Fri, 2008-08-01 12:20

Drupal and MySQl located in two different servers:

Configuration: Drupal server: Dual core processor, 4GB RAM
MySQL Server: Xeon Processor, 4GB RAM

MySQL Enterprise edition 5

In an exercise to Pre populate the database with 1 million records, record insertions in the tables are very slow since the requests are getting queued up in the database. Insertion is done using a special tool that records our drupal application and plays back the scripts in a loop to populate the database.

Please see the below data collected by the team.


Bench Mark Drupal 6 search and Solr Search

Shyamala@drupal.org's picture
public
Shyamala@drupal.org - Wed, 2008-07-23 07:46

Could you validate our Scenarios and the Server configurations. We will share the results with the community soon.

BENCH MARK SCENARIOS:

The Scenarios:

Drupal 6.0 Search + Statistics + simul. Cron + replication
SolR Search + Statistics + simul. Cron + replication
Drupal 6.0 Search/ Solr Search with Statistics + replication
Drupal 6.0 Search/ Solr Search + simul. Cron + replication

Server Configuration:

Drupal Site and DB are to be placed in two Dual core servers with 4 GB RAM each

Software details: Linux - RHEL 5 , Apache 2.2.3, MySQL Enterperise 5, PHP 5.1.6


ApacheSolr Alpha 3 released

robertDouglass's picture
public
robertDouglass - Sat, 2008-07-19 14:41

With the help of Drunken Monkey and many others we've come up with an Alpha 3 of the ApacheSolr module. In addition to lots of bugfixes and a performance improvement, there is a new feature: you can set the number of facets per block. Check it out:
http://drupal.org/project/apachesolr


Search for a large Job portal

Shyamala@drupal.org's picture
public
Shyamala@drupal.org - Sat, 2008-07-12 08:46

Need some clarifications on the best search algorithms to use. I work with a Netlink Technologies. We are currently planning to have a Architect a large Job portal in Drupal. Have convinced our organization that we use Drupal 6.0 and create custom nodes and modules. We are also planning on bench marking the different options of Search that we could adopt.

For Search we are just trying to understand ApacheSolr and Sphinx search.

DO you think we are proceeding in the right direction. Will Drupal - SOLR be a scalable option for a large job portal?

Shyamala
Tech Head


New questions and observations

public
urbanarpad - Fri, 2008-07-11 18:13

I hate to mess up this nice groups page with newbie issues but looks like I will.

The ApacheSolr module looks to be very cool but it's not what I need. It's one thing to backend all of Drupal with Solr but it's something altogether different (I think) to integrate Drupal with an existing Solr implementation.

ApacheSolr Search and other 3rd party Drupal search options

robertDouglass's picture
public
robertDouglass - Fri, 2008-07-11 10:38

This is a BoF discussion that is planned for August 24, 2008 at 16:30 - 17:45 as a part of the FrOSCon conference in Sankt Augustin, Germany.

  • What improvements have gone into Drupal search as a result of the Minnesota Search Sprint?
  • ApacheSolr demonstration
  • Xapian search and Drupal.org

Search is cool. Come talk about it.


SearchAPI Module

BlakeLucchesi's picture
public
BlakeLucchesi - Wed, 2008-03-26 19:10

The following is my first revision of a proposal to create a search API module. I'd love to get some feedback.

Project Details
A Drupal search API would allow for separation between the search interface that end users interact and the back-end indexing and retrieval work that a search engine performs. The advantages to creating a search API are:


SearchAPI Module

BlakeLucchesi's picture
public
BlakeLucchesi - Wed, 2008-03-26 19:10

The following is my first revision of a proposal to create a search API module. I'd love to get some feedback.

Project Details
A Drupal search API would allow for separation between the search interface that end users interact and the back-end indexing and retrieval work that a search engine performs. The advantages to creating a search API are:


Improving the Apache Solr Search Integration module

drunken monkey@drupal.org's picture
public
drunken monkey@... - Wed, 2008-03-26 16:47

I am planning to hand in a proposal on improving the Apache Solr Search Integration module.
The project would include:

  • Porting the module to drupal 6 (if necessary)
  • Integration in Views 2, enabling the use of Views 2 as a front-end to display the search results
  • Writing simpletest unit tests for this module, especially for the new functionality

What's your opinion on that? I have already contacted Robert Douglas to ask for his.


Improving the Apache Solr Search Integration module

drunken monkey@drupal.org's picture
public
drunken monkey@... - Wed, 2008-03-26 16:47

I am planning to hand in a proposal on improving the Apache Solr Search Integration module.
The project would include:

  • Porting the module to drupal 6 (if necessary)
  • Integration in Views 2, enabling the use of Views 2 as a front-end to display the search results
  • Writing simpletest unit tests for this module, especially for the new functionality

What's your opinion on that? I have already contacted Robert Douglas to ask for his.


Refactoring core search

public

Drupal's search APIs received some good attention at the recent Boston Drupalcon. Following up on discussions there, here is an attempt to draw together ideas on directions for refactoring core search. Please wade in and add your ideas and observations.

Existing core search

Drupal core search is implemented in an integrated way, providing a powerful working solution but little flexibility. Core search integrates several distinct pieces, among them:

  • For nodes, a custom SQL-based indexing solution.
  • For nodes, an SQL-based search algorithm.

Building a killer search for Drupal

public

We've had a good discussion today at Drupalcon, in a BoF session led by Robert Douglass. Here's the plan that emerged to build a killer search for Drupal that will help take us Drupalers further towards world domination. ;)

Building a killer search for Drupal

public

We've had a good discussion today at Drupalcon, in a BoF session led by Robert Douglass. Here's the plan that emerged to build a killer search for Drupal that will help take us Drupalers further towards world domination. ;)

New Solr module available for testing

robertDouglass's picture
public
robertDouglass - Tue, 2007-10-09 11:35

I've started to write a new module for Solr integration. After finally getting around to testing and trying it extensively, I can say that Solr is one of the coolest things I've seen in the search space. The module that I've written departs from the current Solr project on Drupal.org in that it doesn't conflict with core search, but rather plugs into the core search framework. I need people who know a lot about Solr to look at my work and help me figure some things out:

  • How best to support multiple search indexes?

New Solr module available for testing

robertDouglass's picture
public
robertDouglass - Tue, 2007-10-09 11:35

I've started to write a new module for Solr integration. After finally getting around to testing and trying it extensively, I can say that Solr is one of the coolest things I've seen in the search space. The module that I've written departs from the current Solr project on Drupal.org in that it doesn't conflict with core search, but rather plugs into the core search framework. I need people who know a lot about Solr to look at my work and help me figure some things out:

  • How best to support multiple search indexes?

Non Java options?

public
sodani - Wed, 2007-09-19 21:29

Lucene and Nutch may sound good if you're familiar with Java, but what if you're not? Since drupal is written in php, are there any php crawlers that drupal might integrate well with? I've tried phpdig but found it to be slow and not well supported.

I've also found rdig, a ruby module but it doesn't seem to have much documentation or support. I'd love to hear other people's opinions on this.

Non Java options?

public
sodani - Wed, 2007-09-19 21:29

Lucene and Nutch may sound good if you're familiar with Java, but what if you're not? Since drupal is written in php, are there any php crawlers that drupal might integrate well with? I've tried phpdig but found it to be slow and not well supported.

I've also found rdig, a ruby module but it doesn't seem to have much documentation or support. I'd love to hear other people's opinions on this.

New Zend Framework 1.0.0 RC

public
jvandervort - Wed, 2007-05-30 19:06

http://framework.zend.com/

Some lucene goodies:

  • Zend_Search_Lucene
    ZF-1262 CaseInsensitive.php missing require_once for the class it extends
    ZF-1263 DocumentWriter.php missing require_once for the class it extends
    ZF-1264 Default.php missing require_once for the class it extends
    ZF-1350 testUtf8(Zend_Search_Lucene_AnalysisTest) failing
    ZF-1351 testUtf8Num(Zend_Search_Lucene_AnalysisTest) failing
    ZF-1365 Zend_Search_Lucene::close() not setting or checking the $_close flag
    ZF-1376 Error in sample code of section 27.3.1.1 - Query Parsing

New Zend Framework 1.0.0 RC

public
jvandervort - Wed, 2007-05-30 19:06

http://framework.zend.com/

Some lucene goodies:

  • Zend_Search_Lucene
    ZF-1262 CaseInsensitive.php missing require_once for the class it extends
    ZF-1263 DocumentWriter.php missing require_once for the class it extends
    ZF-1264 Default.php missing require_once for the class it extends
    ZF-1350 testUtf8(Zend_Search_Lucene_AnalysisTest) failing
    ZF-1351 testUtf8Num(Zend_Search_Lucene_AnalysisTest) failing
    ZF-1365 Zend_Search_Lucene::close() not setting or checking the $_close flag
    ZF-1376 Error in sample code of section 27.3.1.1 - Query Parsing

University assignment

public
zac1 - Sun, 2007-04-01 07:38

Hello All,

I've got an university assignment to create dynamic URL categorization tool
; the ability to match each website to one of 60 pre-defined categories.
We already got categorized URLs from DMOZ.

And since I am a great Drupal lover ,
I thought i might mix between Drupal , Lucene , Nutch and
some bayesian/SVM AI in order to create this application ?!?

Does anyone familiar with such feature in Lucene or some integration ?
Any comments will be welcome..

Thanks a lot !
Zac.

University assignment

public
zac1 - Sun, 2007-04-01 07:38

Hello All,

I've got an university assignment to create dynamic URL categorization tool
; the ability to match each website to one of 60 pre-defined categories.
We already got categorized URLs from DMOZ.

And since I am a great Drupal lover ,
I thought i might mix between Drupal , Lucene , Nutch and
some bayesian/SVM AI in order to create this application ?!?

Does anyone familiar with such feature in Lucene or some integration ?
Any comments will be welcome..

Thanks a lot !
Zac.

CCK fields and custom searching

public
jvandervort - Fri, 2007-03-30 22:13

Any ideas on using the lucene indexing with cck fields, custom searches, and field weighting?

Just curious...

CCK fields and custom searching

public
jvandervort - Fri, 2007-03-30 22:13

Any ideas on using the lucene indexing with cck fields, custom searches, and field weighting?

Just curious...

New Zend Framework 0.9.0

public
jvandervort - Sun, 2007-03-18 14:46

Zend Framework Beta 0.9.0

Zend_Search_Lucene: now matches the performance of Java Lucene

or so they say. We'll see...

  • Zend_Search_Lucene
    ZF-96 Implement Search Highlighting
    ZF-295 Implement score normalisation
    ZF-626 Exception when adding document using static variables
    ZF-693 Using unoptimized indexing database damages storage
    ZF-943 Java examples are no longer necessary.
    ZF-1002 Document deleting/updating problem
    ZF-1050 Result sorting problem

New Zend Framework 0.9.0

public
jvandervort - Sun, 2007-03-18 14:46

Zend Framework Beta 0.9.0

Zend_Search_Lucene: now matches the performance of Java Lucene

or so they say. We'll see...

  • Zend_Search_Lucene
    ZF-96 Implement Search Highlighting
    ZF-295 Implement score normalisation
    ZF-626 Exception when adding document using static variables
    ZF-693 Using unoptimized indexing database damages storage
    ZF-943 Java examples are no longer necessary.
    ZF-1002 Document deleting/updating problem
    ZF-1050 Result sorting problem

New Zend Framework 0.8.0

public
jvandervort - Tue, 2007-02-27 19:03

For those keeping up-to-date. I'll be loading it next week.

They are claiming: Great performance improvement for Zend_Search_Lucene.

New Zend Framework 0.8.0

public
jvandervort - Tue, 2007-02-27 19:03

For those keeping up-to-date. I'll be loading it next week.

They are claiming: Great performance improvement for Zend_Search_Lucene.

I'll be presenting "search" in Sunnyvale

robertDouglass's picture
public
robertDouglass - Sat, 2007-02-10 14:34

The OS-CMS (aka Yahoo! Drupalcon) is coming up, and I'll be presenting Drupal's search capabilities. I intend to demo Lucene too, whether it is code from one of the J's or just some demo code I put together for the purpose. I may be calling on you for help, too! Anyway, just wanted to let people know. Come to the conference, it'll be fun!


I'll be presenting "search" in Sunnyvale

robertDouglass's picture
public
robertDouglass - Sat, 2007-02-10 14:34

The OS-CMS (aka Yahoo! Drupalcon) is coming up, and I'll be presenting Drupal's search capabilities. I intend to demo Lucene too, whether it is code from one of the J's or just some demo code I put together for the purpose. I may be calling on you for help, too! Anyway, just wanted to let people know. Come to the conference, it'll be fun!


Check out the Solr project!

robertDouglass's picture
public
robertDouglass - Wed, 2007-02-07 22:54

Whooohooo, these are exciting times for those interested in Lucene =)

http://drupal.org/project/solr

Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface. It runs in a Java servlet container such as Tomcat.

http://lucene.apache.org/solr/


Syndicate content