Lucene, Nutch and Solr

Events happening in the community are now at Drupal community events on www.drupal.org.
This group should probably have more organizers. See documentation on this recommendation.

Lucene is a fabulous indexer, Nutch is a superb web crawler, and Solr can tie them together and offer world class searching. This group discusses the various projects and efforts being made to integrate these technologies with Drupal.

The ApacheSolr module integrates Drupal with the Apache Solr search platform. Solr search can be used as a replacement for core content search and boasts both extra features and better performance. Among the extra features is the ability to have faceted search on facets ranging from content author to taxonomy to arbitrary CCK fields.

Drupal projects that already provide some level of integration with Lucene and/or Nutch:

fp's picture

Solr in non-english

I am trying to run apachesolr on a site which for now has only French content.

I have attached both the schema.xml I use and the query results from solr for a query on the word "Vidéocassettes".

From what I have gathered so far, I assumed that the following filter

charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"
Read more
katbailey's picture

Telling Solr to ignore certain patterns in indexed fields when querying them

We need to implement html node titles with bbcode (as per http://drupal.org/node/28537) for a client site that's using ApacheSolr. The titles need to be displayed in search results with their html intact so the bbcode version has to get indexed. I need to tell Solr to ignore this code when trying to match queries. For example, if a user searches for "blue smurf" (with the quotation marks), and there's a node with the title "[strong]blue[/strong] smurf" in the index, Solr needs to recognise this as a match.

Read more
cpliakas's picture

Introducing Search Lucene API 2.0

My name is Chris Pliakas, and I am the author of the Search Lucene API module. Although the project has been around for a little while, 2.0 development is coming to a close and I think it is the appropriate time to make the Drupal community aware of it's existence.

Read more
Todd Young's picture

Need Solr Whiz for Early-Stage Guidance

I need a suggestion on how to proceed with a new project from someone who understands the architecture and capabilties of Solr within Drupal.

Read more
chrisrikli's picture

How to display all results with facets

I've been trying to figure out how to display a page that shows all of the results, a la what you see when you click "modules" at drupal.org. Can anyone point me in the right direction?

Read more
ducdebreme's picture

Optimizing Apachesolr for non-english languages

I have had a lot of research about optimizing Apachesolr for non-english languages, especially for German. It comes out, that there search results can be dramatically improved by adjusting Solr's stemming and by breaking up compound words. This can be easily achieved with slight changes to Apachesolr's schema.xml.

This post is about configuring stemming:
http://www.early-dance.de/news/9188-optimizing-apachesolr-non-english-la...

And this post is about compound word splitting, that is needed in languages like German that have long combined words like "Dampfschifffahrt":
http://www.early-dance.de/news/9189-apachesolr-issues-german-and-other-g...

Read more
Todd Young's picture

Architecture question re: huge indexes

I intend to have a Drupal 6 site with some cutom node types indexed by ApacheSolr using the XML Schema provided. However, I'd like to add a lot of gigantic indexes to it that reside only in the Lucene/Solr system (ha) and not physically in my Drupal MySQL database. For example, I may have a thousand or even a hundred thousand nodes in Drupal, but I might have ten different "external" indexes with millions of records each, and I'd like to conduct faceted search against the whole lot of 'em.

Read more
geraud's picture

Trying to adapt "project_solr module for D6" and needing some help

I have this params array:

$params = array(
'fl' => 'id,nid,title,body,format,comment_count,type,created,changed,score,url,uid,name,sis_project_release_usage,ds_project_latest_release,ds_project_latest_activity',
'rows' => variable_get('apachesolr_rows', 10),
'facet' => 'true',
'facet.mincount' => 1,
'facet.sort' => 'true',
'facet.field' => array(
'im_vid_'. _project_get_vid(),
'im_project_release_api_tids',
),
'facet.limit' => 200,
'sort' => $query->solrsort,
);

Read more
dlo001's picture

Search URL

I am trying to integrate lucene api search to my site. But have run into an issue with my ad servers. The URL is example.com/search/luceneapi_node/{querystring} . The problem is that "luceneapi_node" in the url generates ads that are related to lucene and open source. This is a business journal magazine site with ads coming from various feeds and networks. So getting rid of or changing the url path is ideal here.

How do I change the url path or create an apache redirects to go to an ideal "/search/site/{querystring}" or "/search/qry/{querystring}"

Read more

Apache Solr API discussion

Resources

Nitpicks

  • Dependency on search module
  • Do we need keywords as the path? Or can they just be GET parameters?
  • $params is GOOD because it is totally openly hackable
  • Do we want to create a set of new classes and each handles a type of search? Subclasses of a generic class?
  • Do we need to replace or rewrite the PHP library?
  • Is there a way to add facets without a custom module
  • Facets tend to just be displayed as lists.
Read more
pips1's picture

Multi language search BoF session at DrupalCon Paris

Start: 
2009-09-03 10:00 - 11:00 Europe/Paris
Organizers: 
Event type: 
DrupalCon

Dear Drupalistas attending DrupalCon Paris,

Anyone who needs to be able to search content on a multi language site (via Apache Solr or Drupal's built-in search), please join the discussion at the BoF session

http://paris2009.drupalcon.org/session/multi-language-search

The session takes place on Thu 3 Sept 2009, 10:00 - 11:00, Rockefeller room (during second keynote session).

Please let us know if you intend to come by clicking the signup button. Thanks!

Read more
ducdebreme's picture

Apachesolr and Stemming in other languages

When i perform searches for tools and for tool with Apachesolr, i get the same results. As far as i know, the reason is Apachesolr's stemmer, that reduces the word tools to tool and uses it for the search within the index. As far as i know, the stemmer is only aware of English stemming rules.

So how are things going, when using Apachesolr with different languages? For example, i tried the German word Kunden (plural of clients) and Kunde (singular of client) and i get different result sets. -- Obviously, the stemmer doesn't know the German stemming rules ... right?

How can the search results be improved for languages different from English? Are there German stemmers available to plug into Apachesolr?

Read more
tiffanyshack's picture

August Northern Virginia Drupal Meetup - featuring Chuck D'Antonio from Acquia

Start: 
2009-08-05 18:30 - 21:30 America/New_York
Event type: 
User group meeting

Come join us for a presentation and discussion of Solr. We will talk about why its better than core Drupal search, how it improves the user experience and see it in action.

And, we're very excited that Chuck D'Antonio, Acquia's Senior Director of Professional Services will discuss Acquia Search.

Location:
RHODESIDE GRILL - downstairs
1836 Wilson Blvd
Arlington, VA 22201
(703) 243-0145

Read more
drunken monkey's picture

RDF for Solr: Possible improvements

The Apache Solr RDF module is now in a state, where it can already, theoretically, be used. However, there is much room for improvement, so I'd like to discuss some possible ways to do this.

Read more
Sciera's picture

Chat Module

Hi Guys,

Can anyone recommend me a good chat solution for Drupal 6. I am trying to evaluate different solutions like DimDim, 123FlashChat Server, avchat(avchat.com)..

Has anyone used any of these solutions.. Will U recommend any?

I will be using this for a Learning Management solution. I would like to host online presentations, do whiteboarding, have the ability to control the time the chat sessions are on; All chat sessions will need to be stored and indexed for later retrieval. I will be using the apache solr module. That part seems to work fine.

Read more
jatindercheema's picture

How to implement apache solr for multiple sites

Hi All,

I want to know we can implement single apache solr instance for multiple sites, what all changes required to done...???

Jatinder

Read more
drunken monkey's picture

RDF for Solr: Possible implementation strategies

(For information about my project, see here. Put shortly, it's about enabling Solr to index RDF data via drupal.)

Before starting the actual coding, even on prototypes, the basic options for implementing this will have to be discussed. At the moment, my mentors and I see the following three possibilities:

Read more
jatindercheema's picture

Double click ad server

Hi All,

After adding double-click ad server for ads into my site, its page load time increase and performace is slower then earlier...

I want to know how to increase/optimze the page load performace, when i am using double click ad server.

Best,
Jatinder Cheema

Read more
jatindercheema's picture

Single Apache-Solr for multiple sites

Hi All,

I want help in configuring single apache-solr-nighty engine for multiple-sites with different languages.

Please help me in configuring the apache-solr for multiple-sites.

Jatinder

Read more

Adding RDF Support to the ApacheSolr module

Project information

Project page on drupal.org: ApacheSolr RDF Support
Student: Thomas Seidl (drunken monkey on d.o)
Mentor: Robert Douglass (robertDouglass)
Co-mentor: Stephane Corlosquet (scor)
Local mentor: Wolfgang Ziegler (fago)

Current status: Adding features

Description

This project will improve the ApacheSolr module by enabling it to handle (i.e., index and search with a comfortable UI) any kind of RDF data. This will instantly make it possible to provide meaningful searches for all site content that isn't node-centric, as well as content from anywhere else on the web. Only an RDF class description and a way to access the data would have to be provided (apart from the normal Solr requirements) and the module would automatically do the rest of the work.

Read more
Subscribe with RSS Syndicate content

Lucene, Nutch and Solr

Group organizers

Group categories

Projects

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: