Lucene, Nutch and Solr

Events happening in the community are now at Drupal community events on www.drupal.org.
This group should probably have more organizers. See documentation on this recommendation.

Lucene is a fabulous indexer, Nutch is a superb web crawler, and Solr can tie them together and offer world class searching. This group discusses the various projects and efforts being made to integrate these technologies with Drupal.

The ApacheSolr module integrates Drupal with the Apache Solr search platform. Solr search can be used as a replacement for core content search and boasts both extra features and better performance. Among the extra features is the ability to have faceted search on facets ranging from content author to taxonomy to arbitrary CCK fields.

Drupal projects that already provide some level of integration with Lucene and/or Nutch:

transmitter's picture

Solr the right thing or too big?

Hi there,

we are running a locale website that provides basically informations about events, businesses, classfieds, news etc. in one city.
To make it even more locale, we need to categorize everything by the districts / areas within this city.
I want the users to be able to click on:
old town
And see the three last shops, ads, events, classfieds etc. from the old town.
Clicking on 'all shops in the old town' (shops and old town should be arguments somehow) should show the shops in the old town (surprise ;) ).

At the moment I'm trying it with taxonomies, view, panels.

Read more
karma.code's picture

Wildcard searching with Dismax

Hey all,
A wildcard question... I am trying to do an autocomplete text field implementation of SOLR. So, when the user starts typing in a term or a phrase, it will request SOLR results. For example, if a person types in 'car' it should return 'car', 'cars', 'cardigan'.

Now I try doing this:

<?php

function apachesolr_modify_query(){
 
$query->add_filter('fieldname', $value.'*');
}
?>
Read more
nedjo's picture

Searchlight vs. Apache Solr Views

Currently admins wanting to add Solr to their Drupal site have two options: Apache Solr Views and Searchlight.

Both can be used to construct views that filter content (including faceted search) on the basis of Solr indexes. Both projects - particularly Searchlight - are in active development.

Read more
omahm's picture

Solr concept quesitons

Hi,

I have a couple of questions regarding Solr, I appreciate some of these have been covered in http://groups.drupal.org/node/73583

I've attached a diagram of my Solr concept.

Basically we have a number of existing websites, all running on Drupal 6 and we will continue to add to this collection using versions 6 & 7.
I would like to provide Solr search functionality that allows searching within the current site but also provide a global search across all sites from one central portal.

Read more
jpk's picture

Nutch links visualization

Hello,

I am interested to build a links visualization chart for my site via nutch.

I used nutch to crawl the site starting from home page and have a few segments in the segments folder.

Now I need to create a UI which shows the traversal path starting from home page that was executed by nutch with inbound and outbound links per page.

Is there any such tool already available that I can reuse.

If not, any pointers on how I should query the linkdb?

Thanks

JPK

Read more
Nostrathomas's picture

Handling Aggregate Records/Roll-up in Solr

Can someone point me to the mechanism in Sol that might allow me to roll-up or aggregate records for display. We have many items that are similar and only want to show a representative record to the user until they select that record.

As an example – We carry a polo shirt and have 15 records that represent the individual colors for that shirt. Does the query API provide anyway to rollup the records passed on a property or do we need to just flatten the representation of the shirt in the data model.

Read more
robertdouglass's picture

Solr Next Gen - the 7.x-2.x refactoring

We've learned a lot about Solr and Drupal in the past three years. Much is possible with the current ApacheSolr module, but some things aren't possible, and many things aren't easy. There are quirks and limitations that reflect early design decisions which we now could solve better. To move forward and make the future a better place for Solr and Drupal, a new effort is beginning to redesign the integration from the ground up.

These are some of the high level design goals:

  • Study the PECL library, learn from it, and possibly use it: http://pecl.php.net/package/solr
  • Develop a query library that has improved developer usability; for example http://github.com/technosophos/SolrAPI/blob/master/solrapi.inc
  • Develop (with) components that are not Drupal specific so that other open source projects can use them (see above two points)
  • Take advantage of cool things like the Search API, where it makes sense: http://drupal.org/project/search_api
  • Learn from efforts like Searchlight and enable them to build on shared core components: http://drupal.org/project/searchlight
  • Remove node centricity, embrace the Entity API in Drupal-7, and rely on Views for as much as possible
  • Make indexing more flexible and faster
  • Decentralize the module structure so that more contributors can become involved in more projects
  • Allow Solr to be used in more contexts than traditional search (faceted browsing, for example)

Matt Butcher will be coordinating initial planning and research, and together with input from you, will draft architecture documents to get us from where we are today to the bright and shiny future.

The project will use this group as its home. There is now a new tab/page, "Solr Next Gen", as well as a tag that you can subscribe to, so that we can track discussions. Development will happen in the 7.x-2.x branch of the apachesolr module.

Now is a good time to elaborate on the list of design goals that you'd like to see in the new architecture by commenting here.

Read more
karma.code's picture

Searching on only one SOLR field

I would like to query ApacheSOLR to return results based on one field in a node but I do not know how to do this. For example, instead of searching for the word 'dog' in a node's title, body, created date, etc, I only want to search the node title for the word 'dog'.

Can I do this using by implementing hook_apachesolr_modify_query? Any feedback on this would be much appreciated.

Read more
yasheshb's picture

Using hooks for processing all results and facets - Apachesolr 1.0

Hello,

We're trying to filter the results provided by apachesolr in the hook apachesolr_process_results (not sure if this is the right place)
Here's what i'm trying for a franchise locator

Content Type: Franchisee (bunch of fields for title, type, services provided etc + location cck)
Number of nodes: 2000

Now we're building an Advanced Search form which combines the facets for Franchise Type and a proximity search using
Zipcode + Radius.
The form captures the user input
keywords, franchise_type, zipcode + radius.

Read more
agatlin's picture

Newbie SOLR Questions

We are hoping to use SOLR in a couple of non-standard implementations, and I just have a few questions.

  1. If we want to index all of the documents in a specific file directory (e.g. TIFF images of scanned documents), can we do this directly with SOLR, or do we need Nutch? (I realize TIFFS are quite legacy but in this instance conversion to PDF is not an option.)

  2. If there are specific documents on a remote site we want to index with SOLR (again, specific TIFF documents), what is the best way to accomplish this. (We have the specific URLs for these documents.)

Read more
R-H's picture

Use Profile Search

Hello,

Wondering if Apache Solr as implemented in Drupal has the ability to do user profile search. Say I wanted to allow my users to search for other user that met x, y, z criteria and indicated a preference to allow being contacted by another member of the site. Would Solr support that?

Cheers!

Read more
t14's picture

searching content stored in xml using Lucence

Hi

I want to implement the lucence search module into my Drupal site.

However, I have a custom module that I developed myself which pulls content from an xml file and displays it using xslt.

I was wondering if it is possible to use the Lucence search module to offer search capabilities for my xml content.

The xml content is stored in files on the server and not in a database.

Is it possible to search xml content with the Lucence module or its API?

I am also interested in any other solutions for searching xml content.

Thank you in advance for your time.

Read more
mkalkbrenner's picture

Apache Solr Multilingual - Non-English and Multilingual Search

We just released the first alpha version of Apache Solr Multilingual which supports language specific stemming, synonyms and compound word splitting. There's still a lot to do but any feedback at this early stage of the project will be helpful.

Read more
slnm's picture

Solr facet not appearing in sidebar

I have Apache Solr Search Integration 6.x-1.0-RC3. I have several taxonomies. Solr is happily searching them and I've enabled facet blocks for them. All is fine except when I try to enable a facet that wasn't enabled before (via admin/settings/apachesolr/enabled-filters) and enable the associated block (via admin/build/block/list/) and save the block change, the block doesn't appear.

Am I forgetting something? Suggestions?

Thanks.

Read more
yavinty's picture

How to consume external search results via OpenSearch?

If I am not missing anything, it looks that OpenSearch module
allows Drupal to be an OpenSearch provider. What I would like is
to make Drupal consume search results from external systems (such as
Nutch), which also provide OpenSearch.

In other words I want Drupal to be an "importer" of search results
form external Nutch instance, rather than the "exporter".

I would appreciate if someone could shade some light on such integration. Thanks.

Read more
Anonymous's picture

Solr + maps

Hi,

Has anyone managed to display the output of a solr search on a map? Using either gmap, openlayers, etc.

Also has anyone got spatial search / local search working with UK postcodes?

Been trying for a week now with no success - can get core drupal search + facets working with proximity and map output but solr is so much better!

TIA

Read more
yavinty's picture

Building web search engines with Drupal and Nutch

I am building a vertical web search engine. I would like to integrate Nutch with Drupal via OpenSearch Aggregator module, however the module is not compatible with Drupal 6.10+. Should I switch my search engine to Solr and use ApacheSolr module instead? Does ApacheSolr module support OpenSearch? What are my integration options?

Read more
drunken monkey's picture

Creating a generic Search API

The goal of this project is to build a generic Search API that will on the one hand abstract from the data source (using the entity_metadata module) — thus allowing all kinds of entities to be as easily indexed and searched as nodes —, and from the indexer / search engine on the other hand, making concrete implementations like Solr, Lucene, Xapian, … implement only the specific details and thereby eliminating unnecessary code duplication.
Also the gathered metadata and the search engine interface could be used to create a generic Views integration for all searches, thus letting all supported searches display their results as a configurable view.
The planned overall design is sketched in the attached diagram.

Read more
patjov's picture

Creating Research Portal - 2.5M entries to index

Hello Community,

I am currently working on a non-commercial research school project. We are facing the following decision and I was wondering if you have some advice or experience in this manner.

Problem: Creation Search engine (+Community functions etc) for our research field. Additional Tagclouds, coauthoring etc to enhance the search.

Solution: We have created a MySQL DB which contains 2.5M entries (one entry is one paper). It includes the Metadata as well as the abstracts.

Read more
dstuart's picture

Possible Solution | Dev Help

After seeing the post [Dev Help] http://groups.drupal.org/node/46654 it spurred me into looking at generic solutions to the problem. I have written a quick blog about it have a read and see what you think.

http://axistwelve.com/node/15

Comments are more than welcome

Regards,

Dave

Read more
Subscribe with RSS Syndicate content

Lucene, Nutch and Solr

Group organizers

Group categories

Projects

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week