Searching over multiple (heterogenous) indexes

Events happening in the community are now at Drupal community events on www.drupal.org.
robertdouglass's picture

(repost from http://drupal.org/node/296198 by drunkenmoney)

While implementing the attachment indexing mechanism, we (febbraro, robertDouglass and I) stumbled across a problem: how to store the attachment text?
It would be easily possible to just append it to the "text" field or add a new, multi-valued field or both. But then it would be impossible to distinguish the place of occurence of the term at search time, which, unfortunately, is a requirement, since the attachments should appear directly in the search results, not just links to the nodes containing them.

To achieve this, we have to store attachments as seperate documents, but adding them to the same index as the nodes wouldn't be very clean. So we'd have to create a second index for attachments and subsequently always search both indexes (or give the user the option to search only one of them). This is supposed to be possible, but none of us has an idea, how.

Anyone here got an idea? Or a suggestion for an entirely different route, even?

Comments

Question: my organization is

west_d_r's picture

Question: my organization is creating a document library based on cck nodes and taxonomy. We would only want 1 search result the node, not 2 the node and the document. Is there away to reflect the relationship between the document and the node to which it's attached?

Of course, I'm not a programmer and there may be a better way to attach descriptions and meta data to a document. Just wanted to throw it out there.

primary key is now url, not nid

robertdouglass's picture

We've made some progress here. The primary key in the index is a URL, so we can already search across several Drupal websites (that are homogeneous... ie use the same schema). Heterogeneous search will still be tricky, and possibly not as effective, but at least the identifier (url) can support anything we throw at it.

Lucene, Nutch and Solr

Group organizers

Group categories

Projects

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week