Search in Drupal 8

Posted by jhodgdon on January 6, 2011 at 8:14pm

I wrote a page a while back (occaisionally updated) with ideas for the core Search module in Drupal 8:
http://drupal.org/node/717654
(plus there is the issue queue)

There's also the new Search API project, which makes some strides towards modularizing Search:
http://drupal.org/project/search_api

What I'd like to do here is to start a discussion about what the community thinks the Search module should be for D8. Add your comments here...

[EDIT March 15 2010: added file attachment]

Attachment	Size
Search_core_conversation-fin.pdf	399.9 KB

Comments

I believe search should be a

Posted by merlinofchaos on January 6, 2011 at 8:17pm

I believe search should be a framework and core should provide an optional implementation of that framework. The search framework, itself, becomes entirely .inc files and does not inherently include the visible UI so that if no search modules are on, nothing really happens.

After that, the things that I care about are that Page Manager can treat search as blocks of content and assemble them on a search and search result(s) pages as necessary, and that Views is able to integrate with the search as much as possible.

Integrated + meta search, logging API

Posted by fgm on January 6, 2011 at 9:24pm

Three things spring to mind:

the ability to use an "integrated" search, in which all search indexes are used with just one input box, as on any search engine, and of course the ability to qualify searches per "entity" (for lack of a less loaded word) in the query itself, like the filetype: or site: modifiers on www search engines. Ranking is a hard question in that case.
the ability to use a "meta" search in which the results from the engines are aggregated with those from other, non-drupal, sources. Can probably be done by proxying in some way without major changes, though
the ability for modules doing query logging (like core search or zeitgeist) to have a reliable, cross-engine way to log searches and result stats. in D6 this was acrobatic, in D7 hook_search_preprocess partly solves the query logging issue, although it is not designed for this, but does not really wrap the querying process to allow advanced logging

Of course, this means core would include two separate parts:

in the CMF part: the combination of this "framework"/tooling,
in the CMS part: one implementation of an engine that can use the CMF part, just like any alternate engine.

Also, is there still any point in implementing all of this in an OpenSearch-compatible way ?

I agree with merlin. We

Posted by cpliakas on January 6, 2011 at 8:58pm

I agree with merlin.

We should concentrate on making a framework and a simple implementation of that framework in core. The goal of the framework should be to provide common functionality that contribs such as Apache Solr Search Integration, Search Lucene API, Sphinx, etc. can use so they are not forced to build their own. Although fgm brings up some interesting features, I would really like to focus on building the framework to support creating them in contrib as opposed to adding this functionality directly into core. There is a good developer base in contrib surrounding search, and we should find a way to reduce the large amount of overlap that is happening right now. To bullet list this:

An API for Pluggable backends
More control over search URL format (i.e. don't tie search pages to search/* URLs)
Pluggable indexing so contrib can build things like indexing content returned by a view, blocks of content from page Manager, external content, whatever.
A unified Facet building and display API which will lay the foundation for a graphical facet building module that will work across all contributed search modules.

There are a lot more that I can think of, but to me these are the most important.

Thanks,
~Chris

Framework

Posted by Crell on January 7, 2011 at 1:05am

As Earl said, it needs to be a framework, not an application.

We also need to cleanly and completely separate indexing (putting stuff into a search silo) and searching (pulling stuff out). Those are two very different operations and there are many, many use cases for wanting to only do one of those independently.

Also, any assumption that what we're putting in or taking out is Drupal data (entities), or even resembles Drupal data, needs to be removed. If we're dealing with that special case, great, let's do, but there are lots of other use cases as well.

I'd also argue that, like Search API, "search pages" need to be a free-standing entity to be placed wherever (as a page, as a panel pane, etc.) Tying everything through the /search path with a 1:1 match to module names is a fundamentally broken approach in many ways.

Great thoughts!

Posted by jhodgdon on January 7, 2011 at 9:41pm

Here's a summary of what I'm seeing above, plus a few additional thoughts:

a) Core Search as a framework, with a reference/default implementation provided.

b) Modularized framework - separate pieces for indexing (with facets that could include a "type"), searching/querying (with facets), display/UI (modularized into blocks), and logging. Given this, the reference implementation would probably be several modules.

c) Ability to search "things" that are not necessarily nodes or even entities. [As the maintainer of a module (Search by Page) that allows you to search generic pages on the site (such as views output), I am definitely on board with this idea. As a note, the Search API module only works with entities.]

d) Ideally, the ability to mix together different "things" in searches.

e) Not mentioned above: we also need to think about preprocessing (for things like stemming, dealing with punctuation, etc.). This should probably be a filter chain, with the ability of any preprocessing module/filter to also have a way of highlighting found words in the search results (because if the search term and search index have been preprocessed, the search terms will not necessarily be exact matches to the text in the result).

I also forgot to include these two links in my original post:
http://groups.drupal.org/node/71988 - long discussion thread on what the Search API module should do and how it should be structured
http://groups.drupal.org/node/63523 - notes from a discussion at DrupalCon San Francisco about the architecture of Search in D8

Drupal programmer - http://poplarware.com
Drupal author - http://shop.oreilly.com/product/0636920034612.do
Drupal contributor - https://www.drupal.org/u/jhodgdon

Hi jhodgdon. Great

Posted by cpliakas on January 8, 2011 at 5:29pm

Hi jhodgdon.

Great initiative, and thanks for starting this thread and summarizing the points listed above.

One thing about the preprocessing, we have to be really careful to also allow search backends to do their preprocessing completely outside of Drupal. Solr / Lucene solutions have analyzers that apply their own preprocessing and stemming that Drupal hook implementations might conflict with. Sphinx also uses the C library libstemmer which might also conflict with text that has already been preprocessed by Drupal. In other words, I think it is really important to improve the proprocessing flow as you mentioned, but backends should have the ability to "opt out" of this workflow if there is the potential for conflicts.

Thanks again for organizing this,
Chris

Good thought

Posted by jhodgdon on January 8, 2011 at 11:34pm

So maybe if there's a hook_search_info() type of thing where your module could register as a preprocessing step, an indexer, a retrieval engine, etc., one of the things an index/retrieval engine could say is "no preprocessing".

Although some preprocessing could be useful, even if the back end is taking care of stemming. For instance, there could be an acronym preprocessor where you could make certain acronyms expand on indexing, and maybe your back end doesn't do stemming for every human language on the planet. So maybe it would be more of a suggestion, where the module could put some notes up on the screen like "We suggest you don't enable a stemming preprocessor, because this already stems English and German".

Drupal programmer - http://poplarware.com
Drupal author - http://shop.oreilly.com/product/0636920034612.do
Drupal contributor - https://www.drupal.org/u/jhodgdon

jhodgon, I really like your

Posted by cpliakas on January 30, 2011 at 3:03pm

jhodgon,

I really like your ideas here. I think what I am concerned most about here is the flexibility to opt out of preprocessing if necessary. For example, version 3.x of my module Search Lucene API is in development and has pluggable backends for things like a PHP port of Lucene and Elastic Search. Elastic search has a ton of languages included, whereas the PHP port of Lucene has none. Therefore in my case I would only want the Elastic Search to opt out of preprocessing but allow the PHP port of Lucene to use Drupal's porter stemmer module. I think that whether a backend uses Drupal's preprocessing hooks should be configurable per backend so that Drupal doesn;t make any assumptions s to the needs of every site.

Also, I really like your idea of different "layers". Search Lucene API 3.x is making a conscious effort to have the layers:

Backend: Examples are Zend Framework PHP port of Lucene, Java Lucene, Elastic Search, etc.
Indexer: Backend agnostic, responsible for queuing content for indexing, flagging when content needs to be reindexed, and building content for indexing. Therefore it will have the standard node integration, Views integration, page manager integration, and has the possibility for integrating with Nutch or other crawlers to index external sites.
Index: The physical search index, which is abstracted on the PHP end so the adapters are backend agnostic.
Searcher: Provides an interface for searching indexes. This includes a search form, ability to specify a path, or even bypass this layer all together so search interfaces can be built by other projects such as Views.

I haven't had the time to take a look at Thomas' Search API module in depth yet, but I am assuming there are similar layers. In my case, the indexer, index adapter, and searcher are all backend agnostic, and even through the module is Lucene based it is totally possible to build Xapian, Sphinx, and other non-Lucene backends. I think it would be helpful for us as the Drupal search community to iron out what the different layers are that should be defined by core and standardize on their nomenclature. This might sound trivial, but it would probably help before any drastic changes are made to D8 search.

Thanks,
Chris

Agreed

Posted by Crell on January 30, 2011 at 5:30pm

I definitely agree with clearly separating the layers. That is part of the "measure twice, cut once" philosophy that leads to good architecture. :-)

One important point to consider is that there is not necessarily a correlation between Drupal entity, Index, and Searcher. I'm working on a project now where we're indexing a massive 3rd party database into Solr, without going through Drupal, and then searching it from Drupal. That means the searcher cannot have any knowledge of entities or entity-schemas at all. It's purely Views-to-arbitrary-Solr logic.

Similarly, one may want to use Drupal as a host to index a 3rd party source without bringing it into Drupal. So "node integration, views integration, page manager integration", etc. are not relevant to it. I would say page manager integration belongs in the Searcher, or an optional component of the searcher. (You may not want anything more than Views in practice.)

I think we need to try drawing pictures and flow diagrams here to come up with the right breakdown.

Not sure about that

Posted by drunken monkey on January 30, 2011 at 6:39pm

I'm working on a project now where we're indexing a massive 3rd party database into Solr, without going through Drupal, and then searching it from Drupal. That means the searcher cannot have any knowledge of entities or entity-schemas at all. It's purely Views-to-arbitrary-Solr logic.

I don't know how far core search can or should be built for such rather exotic use cases. If the use case uses only a small fragment of the API in the first place, we can hardly do much to facilitate it. Of course, it's reasonable that the interface to the server should have a method for a "raw" search, not coupled to any metadata from Drupal, with which the caller can than do as he pleases. It also should (more easily than now) be possible to create a search page without the whole rest of the search "package".
But anything beyond that hardly belongs in a basic search framework, in my opinion.

@ Chris above: Yes, these layers (or, separate "parts") are present in the Search API as well (although I think Indexer and Index are merged there).

Not an edge case

Posted by Crell on January 30, 2011 at 7:50pm

Based on the RFPs I've gotten recently, I don't think "searching data that doesn't live in Drupal's database" is an edge case. :-) I've got at least two pending right now, not counting the one I mentioned above. Especially for large organizations "I want to expose a big-arse legacy system on the web but I don't want to move everything into Drupal nodes" is an increasingly common request. We need to have some straightforward and consistent answer to that case.

That could mean:

1) Indexing non-Drupal-local content into a search server, and then searching it from Drupal and display results.
2) Searching an already-existing search server from Drupal and display the results. (How the data got into that server is not Drupal's problem; it just knows there's a Solr/Sphynx/Lucene/whatever server there.)
3) Indexing Drupal content into the same index as non-Drupal content (which got there through either method above), and searching them both together.

We have to be able to support all three above the above use cases, since the Entity and Field APIs in Drupal 7 were designed specifically to allow for mixing and matching of local and remote data.

And that's before we even get into the challenge of multi-sourced entities or entities that only have their full data when they've gone through a proper entity_load(). :-)

@Thomas That's great about

Posted by cpliakas on January 31, 2011 at 4:37pm

@Thomas That's great about the separate parts. The reason why I separated the the indexer and index is because, as @Crell said, I too have had to work with data stores that are managed completely outside of Drupal. This use case would have (in my lingo, anyways) a backend, index, searcher, but no indexer since external factors outside of Drupal are managing the data.

My specific use case, I have worked on a couple government projects where a Solr server is set up that has to be searched by Drupal but contains no Drupal data inside of it. Drupal is becoming huge in government and the enterprise, so there will be more and more instances where it will have to integrate with existing infrastructure. I'm not saying that core should support this out of the box, but is should at least provide a basic framework to ease the development process. Not sure, but we might be saying the same thing in this regard.

Sorry to spam this

Posted by cpliakas on January 31, 2011 at 7:39pm

Sorry to spam this discussion...

One more reason to separate indexes + indexers, there may be instances where two separate indexes are indexing the same content. Real world use case, some government requirements mandate that public + private data cannot reside in the same index. Think indexing public nodes vs. nodes that are only accessible to members of a group. Kind of an edge case, but still something that could easily be accomplished by two indexes using the same indexer to retrieve the node data and build it for the backend. A more common use case might be maintaining separate indexes for separate languages.

Important discussion, thanks Jennifer!

Posted by drunken monkey on January 29, 2011 at 3:55pm

Sorry I'm late for the party, I forgot to keep an eye on the group post-D7 …

As is the general opinion here, I too think that core search has to gain a lot more flexibility to be more to search developers than something they'll have to hack around in order to get their solution working. Seperating the framework from the implementation is a central step here. Right now enabling the search module practically implies that you want to index nodes in the database, and want to have search tabs at search/*.
Also, a lot more flexibility is needed in the workflow. We need to let people (both module developers and site admins) choose more freely what they want to do with their data for indexing/searching.

My module, the Search API has been mentioned, and (probably not very surprisingly) I really think this is the way to go. It already takes care of most of the things that were mentioned in this discussion, and I feel that the missing ones can be added pretty easily, as long as there is a consensus on them.

Separate UI from the framework: We'd only need to move the admin.inc into a new module, along with the hook_menu().
"Meta" searches: As fgm says himself, I too don't think this really needs to be supported by the framework – as long as the framework is flexible, someone can easily come and implement this in some way. I'm pretty sure, you could do this with the Search API (and maybe even core search) already.
Indexing anything: As Jennifer said, right now the Search API relies on entities and the Entity API, since that offers a clean, uniform interface for loading items, accessing metadata and keeping track of created/updated/deleted data. However, with flexibility in mind, I'm pretty sure we can extend this to let other modules define searchable content without having to introduce "fake entities" (which would probably be a way of doing this with Search API right now).
I think that the Entity API will make it into D8 core, so we could at least build on that for traditional node indexing, etc. Then we could create an additional abstraction layer on top of that, that has the Entity API just as one possible implementation of the principle, using some kind of an adapter. And contrib modules could easily add their own ways of collecting items and metadata for the Search API.
Let search backends opt out of preprocessing: As Jennifer says, I don't think it's a good idea to let backends completely opt out of preprocessing, since they won't know beforehand what kind of crazy stuff a preprocessor might pull off. Making it clear in the documentation / description for e.g. Solr servers that the user should deactivate most preprocessors should be sufficient.

Views integration, searching multiple indexes at once, extensive support for facets, support for query logging (although I didn't even think of this), pluggable backends, separation of individual tasks (indexing vs. providing search pages), very flexible preprocessing, etc., are all already part of the Search API (or separate modules working with it — even though some of this is backend-dependent, as I don't think that all backends can or should be expected to support e.g. facetting).

Therefore, I'd think that improving the Search API and moving it into D8 core would really be the way to go here. The core framework, UI and the database backend would then probably be in core, while the other parts would live on in contrib modules (unless Views gets moved into core and we can keep search_api_views there as well ;)).

(Hm, there are rather few people in this discussion — maybe someone should write a short Planet post mentioning it? I myself only discovered this through Jennifer's comment in Eaton's D8 blog post …)

Hi Thomas. Thanks for

Posted by cpliakas on January 30, 2011 at 4:10pm

Hi Thomas.

Thanks for weighing in. Obviously your experiences are super relevant here, but I do have a couple of disagreements with what you posted.

First, in my opinion it is a completely unacceptable solution to rely on documentation that backends should deactivate most of their preprocessing stuff. Let me give a real world example. Java Lucene uses "Analyzer" classes to preprocess text. People might have added custom analyzers to do things like index/search punctuation found in programming languages so that source code can be searched. If our solution is to rely solely on documentation to remove these analyzers, we are telling that person they have to strip out the custom analyzers they coded in Java and try to use Drupal's preprocessing to make it work. Drupal core should be about providing flexibility and eliminating assumptions. Allowing site administrators the ability to opt in or out of Drupal preprocessing per index via the admin interface would eliminate any assumptions and give the flexibility for the administrator utilize/deactivate Drupal preprocessors based on their needs.

Second, I am hesitant to say "lets use Search API for D8" at this time because it doesn't have mass adoption yet. Just to be clear, I have the utmost respect, gratitude, and admiration for what you accomplished with Search API. It very well might be a perfect starting point for D8 search, but I want to see a lot more people using it and a lot more maintainers developing on top of it before that decision is made. To help with adoption, I am committed to making any backend I develop for Search Lucene API 3.x pluggable to the Search API module as well so we can iron out any issues we haven't yet encountered.

Thanks,
Chris

Seems we agree

Posted by drunken monkey on January 30, 2011 at 7:18pm

First, in my opinion it is a completely unacceptable solution to rely on documentation that backends should deactivate most of their preprocessing stuff.

Sorry, I seem to have phrased that wrongly. Yes, it would be completely unacceptable to tell users to turn off their backend's preprocessors! I meant exactly what you said, that site admins should be able to choose freely what preprocessing should be done for what data. E.g., in the Search API module: Solr server description, preprocessor form.
All I wanted to say was that I wouldn't find it good if the "backend plugin" (server object, whatever) had the power to circumvent the processors — as you said, the site admin should have the power to make that decision, the backend plugin should just be able to give hints in that respect.

And by the way, I agree with your comment a little further up: first thing we should definitely agree on the separate parts/layers and a uniform nomenclature for them. The Search API has the server (responsible for the actual connection to the search server, sending search or indexing requests, etc.), the index (backend-agnostic information about what gets indexed (entity type and what fields should be indexed with what types) and how (pre-/post-processors and things I call "data alterations" that can add additional fields or filter out items before they are indexed)) and the query (object to encapsulate a search request — keywords, fields to be searched, filters, range, ordering, …). Other modules then build upon this and e.g. use the query to provide search pages or views for indexes.
I'm holding a Search API session at the Drupal Dev Days in Brussels next Sunday, I could then post the slides here to make my architecture (hopefully) clearer.

Second, I am hesitant to say "lets use Search API for D8" at this time because it doesn't have mass adoption yet. Just to be clear, I have the utmost respect, gratitude, and admiration for what you accomplished with Search API. It very well might be a perfect starting point for D8 search, but I want to see a lot more people using it and a lot more maintainers developing on top of it before that decision is made.

You are right that was a bit much to say this early. I agree with you that the system has first to be tested, limitations of and possible bugs in the framework found, etc. With two backends written by other contributors now, this has at least started, and I hope the trend will continue.
If the final decision is to build an all-new search module (or use e.g. parts of your Lucene API as the basis) and the result is flexible enough to make my module obsolete in D8, that's equally fine by me. But since the Search API module already solves most of the problems mentioned here (and, I think, in not too bad a manner) there are at the very least several things to learn from it for core search.

The luceneapi 3.x branch sounds pretty interesting, too. I didn't know that it's that flexible now, with even non-Lucene backends possible. I'll try it out as soon as I find some time!

Thanks for the clarification,

Posted by cpliakas on January 31, 2011 at 4:16pm

Thanks for the clarification, Thomas. I look forward to developing for the Search API module once I get my own house in order, because you stuff is definitely the closest we have been as a Drupal community to a generic search API. Look forward to further discussions, and I am getting pretty excited about the possibilities in front of us.

~Chris

Not entities

Posted by jhodgdon on January 31, 2011 at 4:13pm

Given all of the above discussion, forcing everything to think/behave like entities (or even to be in the Drupal database) is a non-starter for me. The framework needs to be more general, so that there's an idea of passing "stuff" to the search "engine" for indexing (where "stuff" could be entities or might not be, and where the indexing step might not be necessary for all engines), running a search query on a search engine, and displaying search results. Plus some preprocessing and other details.

I don't think that results from more than one search query engine can be merged together realistically, because of paging and ranking, although they could be displayed on tabs or on completely different URLs. I also think we need the concept of search "environments", where you could search one group of stuff with one engine at one URL, and a different group of stuff with a different engine at a different URL.

Drupal programmer - http://poplarware.com
Drupal author - http://shop.oreilly.com/product/0636920034612.do
Drupal contributor - https://www.drupal.org/u/jhodgdon

Agreed

Posted by Crell on January 31, 2011 at 4:23pm

If we can make it really really easy to throw an arbitrary entity at an arbitrary indexer, that's great, but I agree that has to be an implementation of a more general thingie. (Wow, a more general thingie than an entity? Meta!) Cross-engine searches are also a non-starter. My point above was that we need to be able to feed, say, nodes, media files, and a 3rd party data source into a single Solr core and then search that one core. The apachesolr module for D7 will be able to do that shortly (Peter's in the process of merging our code in for that, I believe), but only for entities.

The ability to have multiple independent search silos that are handled separately, though, is definitely a requirement.

If we can make it really

Posted by drunken monkey on January 31, 2011 at 9:35pm

If we can make it really really easy to throw an arbitrary entity at an arbitrary indexer, that's great, but I agree that has to be an implementation of a more general thingie. (Wow, a more general thingie than an entity? Meta!)

In the Search API, the index currently extracts all relevant data from entities before passing them on to the preprocessors and then to the server for indexing. All the server gets is an array with the field values and types. So basically the Search API's design should be able to handle other data sources, as long as the data-extracting step in the index is abstracted. And, of course, the "Fields" form, resp. the act of retrieving the metadata.

Admittedly, choosing to only let entities be indexed by the Search API was a bit of a cowardly shortcut, as entities just had the whole requirements already perfectly set in place. I agree that indexing other things than entities is an important use case for a new core search framework.

My point above was that we need to be able to feed, say, nodes, media files, and a 3rd party data source into a single Solr core and then search that one core.

The multi-index searches module adds this functionality to the Search API, too, at least for Solr (and other backends that want to support this).

Bringing this back up

Posted by drunken monkey on March 4, 2011 at 4:36pm

Hm, this has been lying around for a while now. But maybe after DrupalCon there'll be new discussions.

Just wanted to note here that there is now an issue for the Search API that might soon fix the problem of indexing and searching data from non-entity sources: Integrating Non-Drupal Data.

Also, at the Drupal Dev Days Brussel last month, Markus Kalkbrenner talked to me about the difficulties of adapting searches in Drupal 7 for internationalization, which is proving really hard to do due to the multiple translation possibilities (content vs. field translation). Of course, we don't really know yet what I18n will look like in D8, but this is definitely also something we have to take better care of.

Search at DC Chicago

Posted by fgm on March 4, 2011 at 4:54pm

Note that there is a "core conversation" session at DC Chicago on this topic (which I plan on attending), led by pwolanin:

http://chicago2011.drupal.org/coreconv/core-drupal-search-architecture-d8

Darn!

Posted by drunken monkey on March 5, 2011 at 2:25pm

Sounds very promising. Sadly, I won't be in Chicago, so I'll be unable to provide my input there. :-/ But of course, most of what I have to say is already posted here. And the proposed solution sounds great, I'm eagerly awaiting the session's outcome!
Is there a planned way to afterwards publish the results? Will they be posted in here?

Also see the D8 battle plans.

Notes from Drupalcon BOF

Posted by damienmckenna on March 12, 2011 at 2:28pm

Both jhdgdon and I took notes from the Search BOF, so we'll paste them here.

Notes from Drupalcon BOF, part 2

Posted by damienmckenna on March 12, 2011 at 2:29pm

Continued from jhodgdon's notes when she had to leave the BOF early.

Trac example, highlights need for phrases.
Facets UI - needs to be in core with reference implementation.
Views in core? Unsure, but ability to index a Views page.
SearchAPI currently has a preprocessor, useful for adding e.g. stemmer.
API level, lessons from @crell's API talk, e.g. "One is a special case of Many".
Starting point: consider CTools as "core", migrate SearchAPI to its plugin architecture.
Search display: pluggable, excerpts,
SearchAPI already supports pluggable excerpts
Would help if there was a diagram to explain the architecture.
Support for different types of filters - plain text entry, selects, auto-complete, sliders, etc,
- Delegate to form builder.
Ability to do have multiple pages, e.g. a customized focused search for particular data structures. Already supported by SearchAPI.

My notes

Posted by jhodgdon on March 14, 2011 at 5:17pm

I guess that's a subtle hint that I should put my notes in:

1) Use ctools architecture for plugings as first pass; plan to migrate to new core plugin api when it exists (from the Butler project)
- At Larry Garfield's core conversation, I asked him about this and he said that was probably a reasonable plan, and also that there may be working code for a plugin and context system by the end of May. If we do start with ctools, we should definitely use the OO version of ctools plugins, but we might just want to wait.

2) Need to set up a Wiki with the specs

3) Want a plugin architecture rather than using hooks, because:
- can use or not use plugins from each module
- plugins can inherit from each other
- Definitely want the newer OO plugin style in CTools

4) What to include in the reference core plugins
- Search nodes and users
- Base classes for all plugins that are Drupal-core-independent agnostic
- Subclass to make the Drupal-reference impl
- Facets: user vs. node and node type and author
- Filtering also need to be a plugin
- Search - simple not boolean, but make sure a plugin could do that
but we need phrases

Hmmm... I was going to attach Peter's and my slides from the Core Conversation, but it doesn't look like gdo supports attachments. Sigh.

Drupal programmer - http://poplarware.com
Drupal author - http://shop.oreilly.com/product/0636920034612.do
Drupal contributor - https://www.drupal.org/u/jhodgdon

File attached above

Posted by jhodgdon on March 14, 2011 at 5:20pm

I guess you can't attach files to comments, but you can to the main post, so I went back up to the top and attached Peter's final version of the slides from the core conversation there.

Drupal programmer - http://poplarware.com
Drupal author - http://shop.oreilly.com/product/0636920034612.do
Drupal contributor - https://www.drupal.org/u/jhodgdon

Thanks a lot for taking the

Posted by cpliakas on March 14, 2011 at 7:16pm

Thanks a lot for taking the time to compile the notes and post them to the group!

~Chris

Ditto at Chris

Posted by drunken monkey on March 14, 2011 at 10:37pm

Thanks for putting these up here!
This looks all pretty good to me, this is really taking a great momentum. And I agree, a wiki page for creating a detailled specification before diving in would probably be a good thing, too.

Would help if there was a diagram to explain the architecture.

A diagram of the architecture we want to build, or of the architecture of the current Search API? The latter is contained in the slides of my Dev Days presentation; the former we should definitely make before starting the actual coding.

Filtering also need to be a plugin

What filtering is meant here? Filtering by some primitive (non-fulltext) fields, the fulltext search, or some kind of external filtering mechanism?
That's a good question arising here: How would you plan to execute a search in the new framework?
Currently, in the Search API, this is done with a query class analogous to database queries in DBTNG. This also allows for most of my interpretations of the above quote.
Additionally, the fulltext keywords to search for are handed to the backend in a parsed form (an array structure, complete with AND/OR and possibly NOT), which is currently taken care of by the search query class. By moving this to a plugin of its own (which makes a lot more sense, now that I think about it — but it may also be argued that excessively doing such things would completely clutter the list of different plugin classes) we would also make things like this possible:

Search - simple not boolean, but make sure a plugin could do that

How would you think searches should be executed?

Filtering, search queries

Posted by jhodgdon on March 15, 2011 at 9:01pm

Filtering - hmmm. Not sure what that note is about actually. That was written just before I had to leave the BoF to present a session. Anyone else remember what we were talking about?

Search execution - probably it should execute by calling the plugin::execute() method?

Drupal programmer - http://poplarware.com
Drupal author - http://shop.oreilly.com/product/0636920034612.do
Drupal contributor - https://www.drupal.org/u/jhodgdon

Yes

Posted by Crell on March 15, 2011 at 9:14pm

Querying is an action, so it should be a command object much like SelectQuery. ::execute() is a reasonable pattern to continue using there.

"Wiki" started...

Posted by jhodgdon on March 16, 2011 at 7:47pm

Well it's not really a Wiki, but I started a new page in Core Initiatives where we can flesh out the specs for this project:
http://drupal.org/node/1095092

I think I've captured most of the discussion above. Feel free to add/edit, though maybe anything controversial should be discussed here first. Should we have a d8 issue too?

Drupal programmer - http://poplarware.com
Drupal author - http://shop.oreilly.com/product/0636920034612.do
Drupal contributor - https://www.drupal.org/u/jhodgdon

So, what now?

Posted by drunken monkey on April 5, 2011 at 1:40pm

OK, so what are the next steps here? I haven't really worked on such a large Drupal core initiative before, so I'm not sure. I just don't want this to lie too long and finally not make it into D8 after all.

Do we try to make this one of the "official" D8 intitiatives?
Who would be the "initiative owner", or will otherwise coordinate this?
How do we then proceed Git-wise? Create a new branch (in a sandbox project?) to work on the new search.module? Do all developers who want to work on this get committ access and just try to not do everything simultaneously?
What points are still open and need to be discussed and decided upon? E.g., into which modules will the search module be split (search, search_ui, search_default?)?

Not an initiative probably

Posted by jhodgdon on April 5, 2011 at 9:02pm

I don't think we need to make it one of the "official" D8 initiatives, since it will only have an impact on 3 modules I think (search, node, and user). Starting with a sandbox project that's a clone of the entire Drupal repository may make sense, or it may make more sense to just start a sandbox project for the search.module part. Not sure.

I don't know actually who besides you (Thomas / drunken monkey) has time or motivation to work on it. I don't have time to do much except review and contribute to the discussion. I'm not sure who else has time/motivation even to do that much. Who's mentoring you on the GSoC Project?

Module split: In my opinion, there should be a separate module for each piece that someone might want to turn on or off, or at least the plugin classes should be in their own files, for efficiency.

Drupal programmer - http://poplarware.com
Drupal author - http://shop.oreilly.com/product/0636920034612.do
Drupal contributor - https://www.drupal.org/u/jhodgdon

Whole repo

Posted by Crell on April 5, 2011 at 9:16pm

It's more work to make a spinoff repo that only includes search.module. Cloning all of core is quite quick and cheap and is definitely the way to go for this mini-initiative, unless it will be built as a D7 module first.

Plugin classes in their own files, probably. Module packaging should be based on code organization, not on-off capability. If your on-off capability for the site is based on module existence, you're abusing modules. :-) Eg, you should be able to disable a particular indexer (or set to a null indexer) without touching module configuration. The plugin should not be bound to a module. That's an anti-pattern.

Makes sense

Posted by drunken monkey on April 5, 2011 at 9:59pm

So, what modules would you use in detail? My proposal:

search: The core framework, with the basic skeleton and probably a few generic plugins (default indexer and datasource, maybe also the null indexer, some processors, …). All else will of course depend on this.
search_ui: An admin UI for configuring the whole search process – managing servers, indexes, pages, etc.
search_default: A ready-made search solution that users could ideally just use even without activating the UI module. Could/should probably also include some node-specific plugins (and plugins specific to other core modules).

Questions here would be whether the core framework module shouldn't be completely free from plugins, except (maybe?) the default indexer and datasource; and where to put search pages – search, search_default or into an extra module. If plugins wouldn't be in the core search module, then we could add a fourth module, search_extended (???), including those plugins, search pages and other things that probably will be handy for most people.

Another core conversation?

Posted by drunken monkey on June 14, 2011 at 9:55pm

I just created a proposal for a D8 search core conversation at the DrupalCon London. I'll hopefully have enough time in August to come up with a decent plan on what's needed to get the Search API into a core-worthy state for D8, and what points are still open.
I think that would then be a perfect opportunity to discuss all this, raise and discuss concerns (again), etc. Hopefully, I'd then have enough time in fall to actually come up with some code for this.

Other people who would want to present in a search core conversation are of course welcome to join me.

Great!

Posted by jhodgdon on June 15, 2011 at 2:25pm

Glad you are taking this on! I think you know what my concerns are. I'll definitely be there at the core conversation, but truthfully I am pretty busy with other Drupal responsibilities (documentation mostly), and I don't think I can stay all that involved in Search.

Drupal programmer - http://poplarware.com
Drupal author - http://shop.oreilly.com/product/0636920034612.do
Drupal contributor - https://www.drupal.org/u/jhodgdon

Missing (?): Stemming, Stopwords, i18n and synonyms

Posted by Thomas_Zahreddin on October 28, 2011 at 11:59am

Hi, i'm new to this group and have not read all doks - so please apologize if my topic is a duplication.

Often not direktly seen as subset of search:

Stemming -> build stems of terms so walking and walked are results for the searchterm walks (because all are stemmed to walk)
Stopwords ->words you want to exclude from processing like indexing: modules working with stopwords are e.g. pathauto or de_stemmer and obviously search, so this should be seen as part of the api
i18n -> e.g. Stemming and Stopwords are language dependend so many parts of the api have to take the language into account
synonyms -> often synonyms exist for words: is there a chance to have api-Funktions to enrich search results with synonyms?

Comments

Group organizers

Group categories

Search tags

New groups

Group notifications