Today at DrupalCon San Francisco, 11 of us gathered to discuss what the ideal architecture would be for Search in Drupal Core. Present were maintainers/users of core Search, various contrib search modules, Lucene, and Solr:
http://drupal.org/user/153120 - janusman
http://drupal.org/user/277371 - awolfey
http://drupal.org/user/29191 - douggreen
http://drupal.org/user/266779 - cpliaka
http://drupal.org/user/472460 - jpmckinney
http://drupal.org/user/10297 - unexpand
http://drupal.org/user/733232 - nihiliad
http://drupal.org/user/49851 - pwolanin
http://drupal.org/user/157079 - mradcliffe
http://drupal.org/user/155601 - jhodgdon
We all agreed that we want the core Search module to be pluggable/modular. Here are some notes:
* We want the whole system to be pluggable.
* Steps in the indexing/cron process
- Decide what needs to be indexed (could be nodes, other entities, ...).
- Render each item, or build a structured renderable array
- Pre-process (stemming, n-grams, word splitting, etc.)
- Index each item
* Steps at search time:
- User interface - ask user for the search query (defined syntax, faceted, etc.)
- Preprocess (as in indexing)
- Query to get search results (with ranking)
- Post-process (spelling suggestions, etc.)
- Extract excerpts/highlight
- Display results
* All of these steps should be pluggable.
* Core search would be a framework that would coordinate the steps, and keep track of what content needs to be indexed for each pluggable search index/retrievable framework
* We could also provide a google/yahoo/etc. search box, like what you get in Firefox
* We would also provide (maybe as a contrib module) a default storage/retrieval method, basically the current core search mechanism but maybe limited to single keywords (for efficiency).
* Doug is also interested in building a MongoDB implementation of storage/retrieval
* Solr, Lucene, etc. would also be able to build storage/retrieval
* Needs to be language-aware and compatible with multi-lingual sites
* Needs to be extensible, such as supporting facets as one extension, "advanced search", etc.
* Write it using PHP objects
Next steps:
* Maintainers of Lucene, Solr, etc. will provide descriptions of what they needed to modify in the core framework to get things working
* Hopefully everyone will tell me what I missed and got wrong in this post.
* We'll have some meetings on IRC/Skype, working towards a sprint or work session
* Keep in touch with the GSoC person who's working on search, so hopefully they can do something that will be productive to the effort
See also:
http://drupal.org/node/717654 (Search in D8 and beyond - basically a collection of feature requests for D8, somewhat categorized)
Comments
Search term auto-completion and live results...
Are features like auto-completion of search terms (a la Google) and live as-you-type results (a la Spotlight) on the feature radar for searching within a site? Would be cool.
For an example of live as-you-type results see: http://www.w3schools.com/php/php_ajax_livesearch.asp
Looks like there's already a module but stalled (?) http://drupal.org/project/livesearch
@dahacouk: I think core
@dahacouk: I think core should make it easier to develop such widgets, but it should not be responsible for maintaining any of them.
@jhodgdon Thanks for getting
@jhodgdon Thanks for getting all this in writing. It was great to meet all those who attended the diner, and I look forward to this greatly. Will post something more substantial after I am done with my vacation :-).
Interesting project by Young
Interesting project by Young Hahn at http://github.com/yhahn/searchlight. It was mentioned by kyle_mathews in http://groups.drupal.org/node/57273#comment-163408, but I thought it would be appropriate to cross post it here. Seem to have a lot of the elements we are looking for, although as it is written it wouldn't be a project fit for core because of the dependencies. Regardless, lots of great ideas in there.
And now...
And now the creating a generic search API SoC project was approved. I guess we need to hurry up and get organized so we don't waste his effort.
Awesome initiative!
Awesome initiative!
As one of the search subsystem maintainers...
As one of the search subsystem maintainers (jhodgdon being the other), I'd like to be somewhat central to what happens here. I'm hoping that the SoC project can take some guidance from all of us.
Here's my vision:
To accomplish this we'll probably start from scratch. There seem to be a couple projects already under way for this, maybe searchlight or the SoC project.
Someone please enlighted me as to the SoC schedule. If we have time, it would be nice to add to this for the next week or so, then have a phone/IRC meeting with all the interested parties, and decide how much (if any) code we need to write before the SoC project gets started.
IMO, those who know Drupal search and the Drupal search problems, should provide enough leadership here, so that the SoC student has some general direction, before turning them loose. I think that weekly meetings between us and the SoC student (think lots of mentors) would be a good thing.
Doug Green
www.douggreenconsulting.com
www.dougjgreen.com
Google/Yahoo...
One thing that you've mentioned a couple of times and that I'm not sure about is the idea of a "choose your engine" thing that would search Google, Yahoo, etc.
Are you just suggesting that if someone uses this, and searches for "foo" for example, they would be redirected to google.com (or whichever engine), with their keywords + site:example.com in there (or whatever is appropriate for the engine)? So it would take the visitor completely off their site?
If so, I think this would be better as a contributed module, rather than something that Drupal core endorses.
Drupal programmer - http://poplarware.com
Drupal author - http://shop.oreilly.com/product/0636920034612.do
Drupal contributor - https://www.drupal.org/u/jhodgdon
Yes
Yes, I hear you. Where we put it is not a technical decision. I do want to make sure that our "plugable" search api supports it. I think it would be nice to support the top X search engines with core out-of-the-box-Drupal.
I think that it would be a little more than just search though, for example, when we mark a node for update, we'll want to send this to the search engine backend requesting a re-index.
Doug Green
www.douggreenconsulting.com
www.dougjgreen.com
Requires business-level account
I think what you're talking about (node notifications) would require the site owner to sign up for a business-level, paid account with the search provider. Which would also potentially let them get the results and display them within the Drupal site (at least, Google Custom Search does that), and that would be a good thing for our architecture to support.
So I agree we should make the pluggable architecture support the ability to get results from someone like Google and display them, if someone has an account that lets them do that. But I don't see how we'd need our architecture to support putting up a box that would let people type in a search query and then redirect to an outside site to display the results, though. That's a simple block containing a form that goes out of the site, and not really related to our search architecture at all IMO.
Drupal programmer - http://poplarware.com
Drupal author - http://shop.oreilly.com/product/0636920034612.do
Drupal contributor - https://www.drupal.org/u/jhodgdon
You lose me with field-level
You lose me with field-level permissions for searching. We don't support this with Apache Solr integration now, and while an implementation could feasibly be done, it would be quite a pain and is not something I think should be a priority or perhaps not included at all in a generic framework.
Perhaps we should also be looking at Views 3 in terms of generically defining queries and providing ways for the response to be formatted?
Isn't this a security problem?
If a site has hidden a field from some users, but we index it, and let someone search on it, whether we display it or not, isn't this a security problem?
Doug Green
www.douggreenconsulting.com
www.dougjgreen.com
Yes, so don't use field-level
Yes, so don't use field-level permissions, or accept that you should only index and search on fields that are accessible to all users. If you need to search or filter on some special administrative field, you might have to build a custom interface.
I'm pretty sure that restricting search based on field-level permissions is not supported by core search today. You are restricted at the entity (node) level but not at the field level http://api.drupal.org/api/function/node_search_execute/7
SoC project co-ordination...
I hope all of this will be co-ordinated with the Creating a generic Search API for Drupal project.
@dahacouk I think that is
@dahacouk I think that is what Doug is trying to do here, and I applaud his efforts for doing so. Search development has become fragmented over the years, and this is the strongest initiative I have seen to consolidate our efforts. Therefore, I agree with Doug that the search subsystem maintainers should take a central role in making sure all the efforts that are a result of this initiative are pointing in the same direction. In addition, I think it is the responsibility of people like myself as well as the SoC "students" to make sure we filter information upwards to facilitate coordination.
"Hi" from the GSoC guy
Hi, I'm the infamous GSoC student. ;)
Looking at the discussion start, there are several great ideas and I sense a lot of potential here. My project really overlaps in large parts with this effort (even though it's in the context of a contributed module, not directly improving core) and I'd very much appreciate any directions you could give me. I think that joining efforts here could really be a big step forward for drupal search, not only in regard to D8 but also for D7. I'd love to be of help improving drupal search with my project and only through feedback from different experienced search developers I'll be able to do that.
Apart from a detailled discussion on further progress I'd especially be interested in the rationale behind the choice for object orientation, and how you think this should work in practice. Define some interfaces in core that search engines have to implement? There surely are a number of pros and cons here and it would be nice to have your insights on these.
@ douggreen: See here for the GSoC timeline. Official start of coding is on May 24. However, since my semester continues until the end of June, I probably won't code very much in the first month or so. Therefore, coding efforts until the end of June could probably be easily incorporated into my project.
And of course, advice, insights, comments and feedback will be appreciated the whole time.
SoC
The Drupal model is to prove it in contrib first, then move it to core. So creating a proof-of-concept in contrib is the right thing to do. Your contrib module should be for 7.x. There will be work beyond the SoC project, but that will be up to us (the Drupal community, but you too), to polish it for core worthiness. But if we start with that goal in mind, hopefully we can build it such that it won't need too much cleanup.
We shouldn't get crazy with OOP. One basic tenant to Drupal is that it should be easy for people to contribute too, and too much OOP makes this difficult. I like many of the 7.x OOP interface classes. The 7.x caching system is a good model: An interface class, a default implementation, sometimes a couple implementations, with a variable override to change the class. The interface is OOP, but once you drill down into the implementation, it's procedural.
If you're willing, I'd like to co-mentor you with Robert, and have regular (weekly) meetings to plan and review.
Doug Green
www.douggreenconsulting.com
www.dougjgreen.com
@drunkenmonkey, what do you
@drunkenmonkey, what do you mean by views integration here, are you talking about Drupal views module integration:
We discussed at the Minnesota Search Sprint, abstracting
If you swap the search and index implementation, can the standard display implementation work? Or do we need to tie all three of these together.
Another nice to have use-case fix is, intermixing results from two different sources. This is hard to do right now. But say that you're searching the worlds libraries, and you want to combine the results from multiple z29.50 servers, how would you do that?
Doug Green
www.douggreenconsulting.com
www.dougjgreen.com
Yes, Drupal Views module
Yes, I mean integrating the search API with Views, just like e.g. apachesolr_views does at the moment for the apachesolr module. I'm pretty sure that when backends (search implementations) return their search results in a uniform way, and with the help of entity_metadata, we can create a single set of views plugins that will provide all necessary data to display search results from arbitrary search engines.
As for the query over multiple datasources: this indeed could be an interesting use case. I haven't thoroughly thought about it yet, but I think that as long as all backends provide the same set of data fields for each search result (especially relevancy, which might be tricky), it should be possible to implement that. I'll keep it in mind and see how easily this could be added.