Desiring help with Solr, Nutch, Facet API

Events happening in the community are now at Drupal community events on www.drupal.org.
sethhill's picture

Mathematic Arts is a Drupal development firm in Milwaukee. We recently developed a web site for a research library in Drupal 7, and implemented Solr and Nutch for the search facility. We are using Facet API to filter search results based on a few simple criteria, but would like to do some more complicated filters and to improve the user experience.

For example:

  • Have a facet like content type, but that aggregates many of the general content types that are meaningless to a user. For example:
    General [which includes multiple content types: text pages, landing pages, announcements, etc.]
    Events [unique content type]
    Blogs [unique content type]

  • Have a facet that evaluates the date field in an event content type and gives upcoming/past filtering options:
    Upcoming events [event date >= today]
    Past events [event date < today]

  • Have a facet that filters by date but that excludes external resources whose date isn't relevant

  • Have a facet that lets us exclude results that aren't on the main site by default (we're using Nutch to index external resources). Would be lovely if this could be radio buttons:
    Search main site only [default, only shows results that are in the Drupal db]
    Search all sites [searches full index]
    and if we could extend this further to allow users to pick and choose which sites they want to search that would be even better.

We could really use some help from someone with legitimate experience with all of these components. If interested, contact me at shill [at] matharts [dot] com for more information. Please be prepared to describe your credentials or provide samples of work. Thank you!

Comments

crawl-error

helen1366's picture

hi, i crawl one site that it has 100 link in depth 1, and 100 links in depth 2, but nutch only crawl 23 links from depth 1 and 30 from depth 2. how can i force nutch to crawl all links in depth 1 and 2. i use nutch 1.3
topN=10000
depth =2
and in my nutch-site.xml:

http.content.limit
-1

http.agent.name
My Nutch Spider

Lucene, Nutch and Solr

Group organizers

Group categories

Projects

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: