Lucene, Nutch and Solr

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
This group should probably have more organizers. See documentation on this recommendation.

Lucene is a fabulous indexer, Nutch is a superb web crawler, and Solr can tie them together and offer world class searching. This group discusses the various projects and efforts being made to integrate these technologies with Drupal.

The ApacheSolr module integrates Drupal with the Apache Solr search platform. Solr search can be used as a replacement for core content search and boasts both extra features and better performance. Among the extra features is the ability to have faceted search on facets ranging from content author to taxonomy to arbitrary CCK fields.

Drupal projects that already provide some level of integration with Lucene and/or Nutch:

Beanjammin's picture

Index OG members?

I have a site that is based on Open Atrium and currently using solr for site search. I would like to be able to add organic group members to the search index so that I can search for users within a particular OG. I would appreciate suggestions on how best to approach this.

Solr is set up via the Chapter 3 open atrium features module here http://features.chapterthree.com/openatrium-apachesolr-search/1-0-0, which uses apachesolr, apachesolr_og, apachesolr_search, and apachesolr_nodeaccess. In addition, Open Atrium uses nodeprofile so users' profiles are already in the index.

What I have considered doing is adding users' profile nodes to the same groups that they belong to. This would then make them searchable by OG, however my concern is that I would like everyone to be able to search for users across all OGs, including private OGs, and this would result in permission issues.

I would really appreciate any suggestions. Thanks.

Read more
jimijamesi's picture

AJAX Guided Form Facets

I am using Drupal 6.22 / module apacheSolr 1.5 / Java based Solr 3.1
I would like to extend the default solr search form to include location input and guided facets in an expanded state (displayed via AJAX as part of the form, before any search results are displayed):

1) Location input (postal code field and slider to set proximity range connects to Solr Geo-spatial)
http://wiki.apache.org/solr/SpatialSearch
http://thedrupalblog.com/geospatial-apache-solr-searching-drupal-6-upgra...

2) AJAX Guided facets - based on term user types, with autosuggest module enabled.

Read more
cpliakas's picture

Introducing Facet API

The Problem

The search community is fragmented. The problem stems from a core search module that doesn't facilitate third party backends, so each project is forced to solve similar problems in slightly different ways. Each contributed module has its own isolated sub-community, which is detrimental to Drupal as a whole.

Read more
carn1x's picture

ApacheSolr Live Search / Nodereference Autocomplete

I'm looking for a way to implement live search results in a drop down menu. Visually it would look and act almost identical to apachesolr_autocomplete, however reporting the top 5 search results instead of search suggestions.

I'm hoping that an existing module is out there, although I've scoured as much as I can. I would like to develop the module myself, but the project I'm on simply doesn't have the budget for it :(

Thanks for any advice!

Read more

Roadmap for 7.x-1.x and 6.x-* for Apache Solr module

transferred from http://drupal.org/node/1090080

Please Make your notes or changes inline, preferably to reflect IRC or other more rea-time discussions.

We had a BoF and many discussion at Drupalcon Chicago 2011, here's the take away:

<

ul>

  • Abandon any attempt to keep the schema in sync with 6.x-1.x and remove most or all node-specific fields. Mostly done at http://drupal.org/node/1088208

  • get some Views integration asap (volunteers?)
  • support multiple sorts
  • Move filter URL params from ?filters= to multiple ?f[]= params
  • Improve UI w.r.t. settings per server
  • Read more
    atomicjeep's picture

    Solr Multisite Search

    Hi There,
    This regards http://drupal.org/project/apachesolr_multisitesearch
    I'm trying to enable Multisite Search Facets (based on Taxonomies) for a few sites - the 'normal' multisite facets such as 'filter by site', 'current search' etc appear fine but facets based on Taxonomies do not appear not matter what I do. The facets are enabled & the blocks added to regions, caches cleared etc.

    Tested in Drupal Core 6.20 with the latest stable version of Apache Solr & Apache Solr Multisite Search

    Has anyone else successfully enabled this functionality?

    Any help greatly appreciated

    Read more
    bhp's picture

    Setting the locale through the Drupal Solr API?

    We have a website which is mostly English, but some of the fields in some of our content types are in another language. This other language has different rules for alphabetization, etc. We'd like to be able to sort searches on these fields using the appropriate locale.

    You can do this in the Solr schema.xml file by adding the appropriate locale="..." attribute to the field definitions. I'm wondering whether there's a way to do this through the Drupal API, so that we don't need to modify the schema.xml file. Has anyone else looked into this idea?

    Read more

    April 8-13 2011 Apache Solr Search Integration sprint goals

    Please sign up at http://groups.drupal.org/node/138324

    For real-time chat or to connect with sprinters, join #drupal-apachesolr in IRC

    There will be two skype kick-off calls of about 30 min each on Friday, April 8. One at 8 am EDT and one at 3 pm 4 pm EDT (hopefully that will span enough timezones). Join IRC before hand and share your skype name there to be called in.

    At the outset, some possible high-level goals are:

    <

    ul>

  • integration with Facet API
  • integration with Views
  • UI improvements
  • Expanded test coverage
  • Read more
    muschpusch's picture

    Trying apachesolr views but which views version to use?

    Hey,

    i already created an issue but got no reaction. Someone here using apachesolr_views? If yes please tell me which version of views you use (or which dev snapshot). I tried 6.x-3.x-dev and alpha3 but report errors.

    regards Volkan

    Read more
    pwolanin's picture

    Apache Solr Search Integration virtual sprint

    Start: 
    2011-04-08 (All day) - 2011-04-13 (All day) America/New_York
    Event type: 
    User group meeting

    We are planning a virtual sprint focused on implementing new feature for the 7.x version, as well as stabilizing it and moving towards an RC release.

    Additional work may begin on a 6.x-3.x branch.

    Watch this space for links to discussions and other organizing posts within the "Sprints" group.

    Sign up for this event if you are interested in participating.

    Read more
    jonnyp's picture

    Upgrading servers with Solr / mySQL

    Hello,

    I'm about to move from development to launch on a project that uses solr to index around 800,000 nodes. At present it is on a dedicated server with 4gb ram, along with about 15 other websites I host. It is clear that under load I am experiencing slow downs in mySQL from a lack of memory, and from what I've read it appears the RAM allocated to Solr is a major factor in how fast and responsive your solr searches are.

    What I'd like to know is

    • Should I be aiming to upgrade this server with more memory, or to launch a new box that is dedicated to mysql/solr/both?
    Read more
    ygerasimov's picture

    Weired wildcard search with EDisMax handler

    I am using Solr 1.4.1 with EDisMax according to http://drupal.org/node/713142

    When I do search for part of the word I see results properly. (In my case the search is for *verhuiz*)
    But when I do search for bigger part of the work I get no results. (Nex search is for *verhuizin*).

    Even I know that the original work is "verhuizing" and it does exist in my documents.

    Please find attached responses and details of the Solr.

    Can anyone advise what might be wrong?

    Read more
    ebeyrent's picture

    Parsing Views Feeds

    I am new to Nutch, and am attempting to parse a site that has two Views blocks on the front page, both also providing feeds.

    My first attempt to parse resulted in the following error:

    parser not found for contentType=application/xhtml+xml

    I attempted to fix this by editing conf/parse-plugins.xml, where I added:

    Now, when I attempt to parse, I get the following:

    Read more
    maxmmize's picture

    Managed to lose Highlighting

    Hi,

    I've managed to lose highlighting in my description/teaser.

    http://drupal.org/node/970928
    http://drupal.org/node/968308

    I have it turned on in my solrconfig.xml line 183. It is set to true.

    It was working when i was using Jetty and Solr 1.4.0, then I switched to Tomcat and created a new Solr 1.4.1 instance. I modified my teaser length as noted above. I basically just crawled and kept the entire page using nutch, passed it to Solr and displayed my teaser with <?php print substr($snippet, 0, +400); ?> ...<br/>

    Read more

    Install and Configure Nutch in 5 minutes

    Ok, here we go. This information is only relevant to those wishing to start out with Nutch for the first time or developers who test various Nutch functions and have to tear down and setup to confirm results. There are many ways to do it but this works for me. Also, there are scripts here. So, run them t your own risk. If you don't know, ask.

    1.) Login to your server using ssh and create a dir named /stuff
    a.) mkdir /stuff
    b.) touch nutch.sh
    c.) vi nutch.sh

    Read more
    broncomania's picture

    Make a relation between Nutch crawled websites and the user

    I try to crawl user websites and build an relationship between them in solr. My knowledge is just at the beginning of nutch and solr, but I think this is really usefull feature. Maybe someone had expierences with this topic and give me a clue or a hint for doing this witch nutch, solr and drupal.

    Read more

    Start Solr Automatically at Boot

    Once you get Solr working in Drupal you may find it handy to have your Solr instance start whenever your server is rebooted. Doing this is simple and done form the Command Line Interface (CLI). For the beginners (such as myself) you will need to have ssh access to your server. If you do not, ask your hosting company what your options are.

    SSH into your server.

    Read more
    maxmmize's picture

    Installing Nutch

    Hi,

    I'm new to Drupal, new to Solr, new to Nutch. Thanks to Robert for his dedication in answering my questions.

    I installed Solr with Jetty and it is working. I am using Robert's module. (Thanks again)

    I have installed Nutch v1.4.0 inside of my /home/lib folder. I have read a lot of the documentation for Nutch. I installed the nutch crawler from Drupal. I run 6.x. I run a Centos box. I'm using Apache 2.0 and PHP 5 v2.9. I have Tomcat 5+ running.

    Obviously, since Solr is working, everything is fine for a Nutch install (with Tomcat).

    Read more
    marashi's picture

    How to join Apache Solr docs?

    I'm working on a hotel booking websites and I have two content-types which names are hotel and room and each room is a child of a specific hotel (I have made the relationship with Node Reference).
    I have used Apache Solr to index and search the contents but because the rooms and hotels are separate content-types they will be index in separate documents (doc) in Apache Solr, so when I want to query something like "Show me hotels which have a room with less than $200 per night", it fails. because price is an attribute of room not hotel, but my search is based on hotels!

    Read more
    robertDouglass's picture

    Search API module is a home run, congratulations Thomas.

    I can tell by the usage statistics that you haven't tried the Search API module yet! Run, don't walk, to the download page and give it a go. You'll have to pick yourself up a copy of the Entity module as well, but then you're all set.

    Read more
    Subscribe with RSS Syndicate content

    Lucene, Nutch and Solr

    Group organizers

    Group categories

    Projects

    Group notifications

    This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: