Overview
The goal of this project is to build a generic Search API that will on the one hand abstract from the data source (using the entity_metadata module) — thus allowing all kinds of entities to be as easily indexed and searched as nodes —, and from the indexer / search engine on the other hand, making concrete implementations like Solr, Lucene, Xapian, … implement only the specific details and thereby eliminating unnecessary code duplication.
Also the gathered metadata and the search engine interface could be used to create a generic Views integration for all searches, thus letting all supported searches display their results as a configurable view.
The planned overall design is sketched in the attached diagram.
Description
The project's deliverable will be a module (probably named "search_api") which tries to generically implement all search tasks that aren't specific to a concrete implementation, but leaving enough liberties to faciliate needed individualities. The administrator will be able to create searches based on any entities known to the entitiy_metadata module and using any search engine that implements the API's interface.
At index time, the API module would then use the metadata to retrieve items to index for all created searches/indexes and feed them to the concrete indexers with the right data types, etc. Concrete indexers would only have to do the fundamental indexing, not maintain the data still to be indexed, for instance. How the API could additionally help at search time needs to be determined, maybe (configurable) advanced search forms could be generated from the metadata.
In any case, the metadata could also be used to build data type-aware Views integration, even when data types aren't even supported by the underlying indexer, and with the indexer having hardly anything to implement on its own. This Views integration could be used to automatically create a search results page view for searches.
Well, this sounds on the whole like quite some work and before actually applying I'll definitely need a few good guesses about how much of this is really feasible for a single GSoC project. Anyone's opinion here is welcome, also regarding priorities and what parts would minimally need to be implemented to let this module be of any use (or, work at all). Given enough time, however, even things like generic facet blocks should be possible.
About Me
I'm a 23 year old CS Master student living in Vienna, Austria and already a bit of a GSoC veteran as this would be my third Summer of Code project for Drupal. In 2008, I provided Views with pluggable data backends and implemented one for the apachesolr module.
Last year, I created the apachesolr_rdf module, which is a bit similar to this proposal but uses RDF instead of the (then non-existent) entity_metadata module and is only for Apache Solr. It also lacks Views integation, but I still think the experience gathered with that project (especially its flaws, and some of the difficulties that had to be overcome) could be well used for the one proposed here.
Mentors
| Attachment | Size |
|---|---|
| concept_overview.png | 20.81 KB |

Comments
Don't forget sphinx
While I know solr is awesome and all, sphinxsearch trunk already supports real time updates and it will surely have a release by the time this project gets underway. Please integrate with it too.
Project scope
I don't think I'll have enough time in the project to integrate every search engine on my own. I'll provide the interface and will probably integrate apachesolr as an "example" implementation (because I'm the most familiar with it), but other search engines would have to integrate with the API on their own. The project already looks rather large for a single SoC project, so I can't and won't promise to integrate all (or even only the most popular) search engines on my own as well. (Especially as I'm unfamiliar with most of their code bases.)
But of course the API would be written in a way that lets engines apart from Solr integrate just as easily. So hopefully whoever would integrate Sphinx would have no hard time doing it.
And maybe, if we take a wider look, I could work on improving the API / Views integration / etc. and integrate some other search engines next year — if the demand is there.
sphinx++
I agree. We're using sphinxsearch on Ketnet and it deserves as many eyes on it as Solr.
Hmmm...
Just for reference, I'm now (for better or worse, because no one else stepped up) the maintainer of the core Search module, and I also have a couple of contrib search-related modules on drupal.org.
So... The core Search module does need to be reworked for Drupal 8 (hopefully). I've outlined a few ideas on http://drupal.org/node/717654 already, and am wondering how this idea fits in with those concepts, or if you're planning on having this just be a completely separate contributed module going forward?
If you're thinking about having this idea replace the core Search module, ...
One thing the core search module currently does have is the flexibility for other modules to extend it. I think maybe your idea would remove the need for some of that, but I wouldn't want to lose the flexibility of allowing things that aren't nodes or entities to be searched within the core Search module (for instance, my Search by Page contributed module allows you to search generated pages such as Views). Or the flexibility to let an outside module define how the content to be indexed is generated (for example, Search by Page renders it in the theme from the point of view of a particular user role, rather than using the default rendering view of the node, for indexing). Or the flexibilty to not index everything (Search by Page lets you choose user roles and content types to be indexed, and excludes others).
Anyway, I'm definitely hoping to make the core Search module more ... well, for lack of a better word, modular in Drupal 8, and definitely would welcome some help. Someone who's familiar with Solr could definitely contribute in breaking Search up into modular pieces so it would better integrate with Solr, Lucene, etc. as a back end for doing the indexing and searching, if that is even possible.
Drupal programmer - http://poplarware.com
Drupal author - http://shop.oreilly.com/product/0636920034612.do
Drupal contributor - https://www.drupal.org/u/jhodgdon
Thanks
Thanks for the link, those are quite some impressive ideas. Especially "Multiple tabs per module" and "Pluggable index and retrieval" are issues I already stumbled acrosse with the current core Search module. I really like where this is going and would love my module to be of help.
Of course, for D7 this has to be a contrib module, but depending on the reception it could certainly be best to incorporate some of it into core Search for D8. I don't know if my module could really replace core Search, but there are of course several overlappings. It seems I should in any case take a look at the Search by Page module, obivously it does some interesting things. ;) And since I'm all for flexibility anyways I can of course try to support them, if possible. Allowing data sources apart from entities, for example, should be rather simple as far as my experience with apachesolr_rdf goes. And now that I'm aware of this use case there doesn't seem any reason not to account for it.
In any case thanks for your effort and the best of luck with core Search! It would certainly profit from some rework.
Searchlight from Development Seed
I'm not sure where they're taking this... but this experimental search backend module Young Hahn has put up on github looks interesting.
http://github.com/yhahn/searchlight
From the README:
Kyle Mathews
Thanks for the link
Thanks for the link, this really looks interesting. A shame it isn't usable yet, but I'll definitely take a look. The paradigm seems to be a bit different, but the overall goal (at least backend-wise) matches mine and it should at least be possible to gain some insights based on that modules design and development.
Proposal available
I've now finished and submitted a proposal, please feel free to read it if you're interested. I've set it to "public", so it should be somehow readable not just by mentors, but I don't really know if or how that works. Anyways, if you can gain access, it should be available here: Creating a generic Search API
If this public access thing doesn't work, I could also post it here — just ask.
facet api
Just to mention Facet API module. I see efforts around searching in drupal, kudos for all!
Yes post your proposal here
Yes post your proposal here !
I m highly interested in this project.
We want to work on an Search API that could work on both MySQL default engine and Solr (at least).
We have many points to work on :
It's a bit outdated now …
If you want, I could post it, but since the project was already finished some months ago (can't really say from your comment if you knew that already), the proposal is quite outdated already.
These pages provide far more accurate information:
NO lol the module looks
NO lol the module looks wonderful... but the proposal page came first in google ...
What do you thing about the CCK integration ?
and enhance/replace the default content search in admin ?
PS : Our goal is to create a back office to manage huge number of nodes...
The module is Drupal 7, so
The module is Drupal 7, so mostly it's about Fields, not CCK, which are fully supported. I don't exactly know whether node references are supported or not – if they aren't, this would have to be either fixed in the Entity API module, or in CCK by providing proper entity property info definitions. Best ask in the Entity API module first. Or rather, test if node reference fields are recognized first, and if they aren't, ask.
The widget you'd have to code yourself, either way, I'm afraid, such things aren't contained in the API. But there is a Views integration, so maybe this would be rather easy to do.
The default content search can be replaced, although having it in a tab like the normal searches would require a little extra code.