Building a view from a different source (not Database)

Events happening in the community are now at Drupal community events on www.drupal.org.
drunken monkey's picture

Hi, everyone!

I'm doing a GSoC project to improve the apachesolr module (See here for details). Core of the project is to find a way to create a view showing results from a Solr search query. And this is where I could use some ideas from people more experienced with Views than me.

As you know, Views dynamically builds an SQL query which returns the objects to display. But since the Solr search data is not in the database, and also has no SQL interface, Views can't be easily used to display objects from this or similar sources.

I've already found (and published, see apachesolr HEAD) one workaround: a filter which inserts Solr's search results as a "WHERE node.nide IN (...)" into the Views query. It should also be possible to change this to an argument, so it does not only work on the Solr search results page but also as an independent page, executing the query itself.
This approach still has some drawbacks:

  • The Views query as well as the Solr search have to be executed, decreasing performance.
  • In the current state, adding additional filters to the view decreases the number of results displayed, since the apachesolr filter filters out exactly ten results (the actual number is configurable). It might even show no results at all, when there really would be some.
    Currently, a workaround is to set the number of results displayed by Solr to a higher number than that of the view, but to completely remedy this one would have to use a Solr search without result limit, further dropping performance.

One idea from my mentor and maintainer of the apachesolr module would be to first execute the Views query and then pass the returned nids to the Solr query, but this would have the same problems as the current approach. Moreover, I think that because a search typically returns less nodes than most filters, the effects of the second problem will be far more pronounced this way, unless Solr is really good with such things. However, it remains to be tested.

One other idea I've got would be to create the view programmatically and replacing the call to $view->execute() with an equivalent custom function. But then the problem would be that either that function would have to parse the SQL query (or at least make a good attempt) and translate it to one for Solr, or the view couldn't use any predefined filters. In the latter case, it would operate on a new fictional "apachesolr" base table, maybe with an own way to define filters.

So, as you can see, all approaches I can think of have more or less serious flaws.
Which one would you use? Can you think of additional ones, or improvements to the existing? What are your thoughts on the whole subject? What advice do you have?

Thanks in advance for your responses!

PS: Oh, and this uses Views 2, before you ask.

Comments

How about Views 1?

jayjaylabz's picture

Hello man! I've been waiting for your release of Views-Apachesolr integration since late May. I've been expecting the module this July. And Oh..im surprised that this is only available for Views 2. The site I co-maintained used views 1 and I'm skeptical that your soon-to-release and my long-awaited module might not be usable at all due to different versions of Views I have. Please tell me how to deal with this..

Thank you so much man!

Please tell me how to deal

catch's picture

Please tell me how to deal with this..

You'll need to upgrade to Drupal 6 to take advantage of views 2. And it's only Views 2 that's flexible enough to do the stuff DrunkenMonkey is talking about.

Have you looked at the Views Search module?

alex ua's picture

...that might give you some clues. We've been using that for a few projects and it's worked well to do things like per-book searches...

Alex Urevick-Ackelsberg
ZivTech: Illuminating Technology

Alex Urevick-Ackelsberg
ZivTech: Illuminating Technology

Thanks for the tip!

drunken monkey's picture

Thanks for the tip, I'll take a look!

regarding your current

fago's picture

regarding your current approach, I think you should use the nids of the whole result and pass it to views - only then you can make use of all features of views like sorting through some fields. So you could use views for representing the results. It might increase performance when you put the result set in a temporary table and let views join on it. Perhaps you can elaborate on this.

But then one shouldn't make use of additional views filters, as then the facet counts wouldn't correspondent to the result set.
I think this approach is probably most straight forward, but is of course not ideal in regard to performance.

Idea: Different views backends

In an ideal world views we could have separate views backends operating on different data sources and one "views presentation layer" which operates on top of that. Where different data sources might be different relational databases, but also things like a RDF store or solr. But the current (awesome!) views version doesn't support that in a clean way.

But perhaps it possible to get it working by doing some like this:
* Override the views query class with a new one, which is used by the solr fields for generating the query to solr. Perhaps the solr php client could be used here?
* Specify a new virtual "solr" base table with all available fields and specify that it uses our new query class.
* Make views support this different kind of query classes, so that it instantiates the right one and define an interface for it.
* Change views to abstract how the query is run, e.g. by putting this in the query class. Or just override the views class and it's build() method with a suiting one? And of course the solr query would have to return the data in the same structure as the original one...

So we would have a separate base table with a separate solr query object. We would have to provide supported fields, filters and so on suiting for solr and it's query object. But then we could directly use views to build the queries and interfaces using solr :)

Following this approach would be a really stony way, but offer great and useful functionality. But of course, without the help and goodwill of merlinofchaos it's a dead end, so probably the "straight forward" approach is better suited for you.

Thanks for the detailed response!

drunken monkey's picture

Regarding the straight forward approach: I've now done that (use an unlimited search in Solr to pass all results to the view), I think it's the best way for this approach. As long as the result isn't too big, which it probably won't be for typical searches, the performance drop won't be too bad. Only for really large sites this approach would be a serious performance strain. At least I think so, I'll have to perform some test to really quantify this performance penalty.
I haven't thought of using temproary tables, thanks for the tip. I'll be sure to test that approach when measuring the performance of the different versions.
I also didn't think of the wrong facet counts when filters are used, but I can't really see a way out of that, either.

Regarding the other idea: Yeah, that would probably be the cleanest solution but as said that would only work when changing some Views code, or when manually creating the view programmatically, which wouldn't be practicable regarding usability. So this approach will probably not work on a large scale, but maybe I'll just try it out and see what the results are and how realistic it would be to patch the Views module accordingly.

It wouldn't be too difficult

merlinofchaos's picture

It wouldn't be too difficult to specify what class is used for the query class and use a different backend, sure. That's 'easy' from Views perspective (though hard from the perspective of having to start from scratch on a whole new backend and have it completely separate from the existing data).

That said, I'm not sure what it buys you, because it's still completely separate from the existing data.

Benefits of seperate backend

drunken monkey's picture

Well, if the view gets its data directly from Solr instead of the database, it will a) perform better and b) facets, filters, arguments and sorting will all work like normal. Not sure if it's worth it, though, losing the whole support for the standard backend.

a lot

fago's picture

I think it would buy one a lot - so one can use the power of views to easily create custom searches with the views UI as well as one would benefit from the comprehensive display options and various style plugins. In contrast to other integration solutions it performs better and allows you to potentially list arbitrary external data indexed by solr by just doing the right views integration.

For me this is key feature when integrating external data sources - suppose integrating a RDF store. What is needed for the integrated data is not only an API to access it, but an easy way to list and present it as well as making it searchable. Views would be the ideal solution for this. So for integrating a RDF store a SPARQL backend would be awesome. Just dreaming... ;)

patch

Services as views data source?

adub's picture

Not regarding the search use case above, but being able to use views as a client for the services module (and other external web services) would be pretty useful. Currently I am building hardcoded blocks and ad-hoc interfaces for filtering data which I think would all fit better into the views builder interface.

I'm trying to accomplish

pbriggs's picture

I'm trying to accomplish something similar in a module i'm writing... http://drupal.org/node/280351 . If it isn't possible to use Views 1 to display non-DB data then are there any other approaches you could suggest? Upgrading to Drupal 6 isn't an option unfortunately. Any ideas would be great!

Upgrading to drupal 6

jayjaylabz's picture

Thanks catch for the tip but upgrading to drupal 6 just to use this module is not a very good idea. The site I worked on with is not just any other drupal site, it's a actually huge site and got a lot of contributed module. The basic thing I want to have is the Views 1 version of Thomas' module, which I now also tried to make (thanks for the idea man!). I understand the difference between Views 1 and Views 2 API and that's where the difficulty arises. Right now, I'm hoping someone would do the thing I asked since I'm not so good with module development (I can write, but ashamed of being "not drupalish") and I'm basically working around some performance issue in our site.
Please lend some hand for me. Thanks anyways...

Troubleshooting

drunken monkey's picture

Not sure where this should go, so I'm reusing this discussion.

I've now started and almost finished building a custom backend and bringing Views to use it. What I'm really glad about is that the things that worked previously are still OK, afaict.
Where I need help is the new backend. Debugging shows that the results get returned correctly from it, exactly like those of a normal query. But they are simply not displayed. The table is built like normally, only all table fields are empty, leaving the output looking like this:

Only local images are allowed.

It's probably too much to ask for someone to checkout my current status (in the HEAD of the apachesolr module) and really install and debug it, but maybe someone here has suggestions where something could go wrong? I've tried on my own for some time now, but the whole process of building a view is rather complex so I've had no success, yet.

hm, I had a short look at

fago's picture

hm, I had a short look at your code and perhaps the problem is in your field handler? You override query(), but not render(). Render uses the field alias, which is usually set in query(), so probably it fails.
I'd suggest to look over all the methods of a field handler and make sure everything works fine with your query object.

OMG!

drunken monkey's picture

Wow, about two weeks of debugging and all it took was a single line at the right place! Devastating.

Anyways, thanks a bunch, you really saved me! I'd probably gone insane debugging, in the end...
Everyone else, relax, the issue has been solved. I'll go post to the issue queue.

you are welcome :)

fago's picture

you are welcome :)

Interesting. If you don't

moshe weitzman's picture

Interesting. If you don't have a php debugger, you might want to print out the $view object during its various phases. See http://groups.drupal.org/node/10129. An example from there:

<?php
$view
= views_get_view('foo');
$view->set_display('page');
$view->set_arguments(array('first', 'second'));
$view->is_cacheable = FALSE;
$view->execute();
foreach (
$view->result as $result) {
  foreach (
$view->field as $id => $field) {
      if (!empty(
$view->field[$id]['handler'])) {
       
$view->field[$id]['handler']->pre_render($view->result);
       
// Do something with this unrendered result object
     
}
    }
}
?>

Thanks for the quick answer.

drunken monkey's picture

Thanks for the quick answer. I already tried looking at the view object, but not over the course of the building.

However, I now followed your advice but couldn't find something peculiar in the view object with the different backend. Most importantly, the total_rows and result fields always have their correct values.
What should I look for/at in the view object, do you think?

perhaps you could use a custom style plugin?

davidwhthomas's picture

Perhaps you could make a custom style plugin that simply gets the node ids from the Views query.

The nids could then be passed to a Solr query and the query results displayed in the style plugin.

It could also check some of the path arguments for filters etc...

Views Developers

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week