Connecting to external systems

Events happening in the community are now at Drupal community events on www.drupal.org.
Anonymous's picture

I've been asked to look at a system for a museum who have an existing collection database they want to be able to search using Solr from Drupal 7. I've used Solr in quite a few projects and always periodically imported the data but at the moment they'd rather just connect the two via an as-yet-undefined interface. Is this a possible reality or should I stick to my guns re periodic import? I prefer import as you don't have to worry if the connection goes down, plus Drupal 'understands' nodes, plus I want to index it with Solr, which I know how to do easily with Drupal!

thx

Comments

Maybe I could help

gaëlg's picture

For a project next year, I had to index an external database and display data (which were not nodes) in a Drupal site. I used Solr and its data import handler, but built my own module for that (not based upon the Apache Solr Search Integration module). Then, as I had a bit of time before the end of my internship, I started refactoring the module to genericize it and integrate it with Apache Solr Search Integration. The generic module works but is not really user-friendly, lacks many functionalities and was never used in a production context. It's more like a proof-of-concept. I decided to wait for the moment when I'll need it for a real project to work on it again and eventually share it as a drupal.org module.
If you think it could fit part of your needs, I could publish it as a sandbox project so that you could adapt it (it's D6).

Thanks, that would be great!

stevepurkiss's picture

Thanks for your reply - would be great if you could post the code, I'm sure others will benefit from the insight too. It still seems like a lot more work than just importing the data, but perhaps I'll change my mind when I've seen how you approached the problem.

A few qus

dstuart's picture

Hi Steve,

Yes it's possible, whether its right depends on a number of different factors. Where is the master data stored the collection database or Drupal? Are you going to have to build an import into Drupal from the collections db and if so is the sole purpose so that you can populate solr? If so then putting it directly in Solr would make sense. If the information is stored both in Drupal and collections db there is no reason that Drupal can't index its bit and the collections db can be pushed via its method. When you say connect the two via an interface is this bidirectional or put or get only??

Regards,

Dave

get

stevepurkiss's picture

Thanks for your reply Dave!

It would be for searching the collections db where all the data is - it sounds from your email as if that may be quite an easy thing to do - any URLs you can point me to find out more on how to do this? I spent a while searching earlier but couldn't find anything apart from nutch stuff. Would I just put the XML direct into Solr?

Scheme Free ApacheSolr Passthrough etc

niccolox's picture

hi Dave

have been kicking the tires of the Nutch and Apache Solr Views modules and came across the Apache Solr Multi Server module

have you considered updating those modules, or would you be open to, integrating the work on allowing Drupal to access Solr indexes that DO NOT have the Drupal schemas?
https://github.com/atchai/apachesolr_passthru
https://github.com/markbirbeck/apachesolr_views
related is the modified ApacheSolr Views module that works with the ApacheSolr Pathru module above, could we fold those changes back into ApacheSolr Views and Multi Site Server?

it would be really neat if there was a Drupal distro with Nutch, Tika and the various Apache Solr configurations ready to go - especially with schemas ... I guess the distro would have two include two packages, Drupal and Java WAR ...

have been thinking something suitable for Aegir too, it might have to be a special kind of installation package, CiviCRM has a unique installation config for Aegir
https://wiki.koumbit.net/CiviCrm/Installation
https://redmine.koumbit.net/projects

I wonder if the Apache Solr Multisite module might benefit from having a pull-down option for selecting a schema type? D6, D7, Nutch, Solr etc ?

Sounds great

dstuart's picture

Hey @niccolo,

Sounds like a good plan, the combination Apache Solr View and Multi Server modules will allow you to use non Drupal based schemas as the multi server will use Solr's field inspection functionality to grab what is available and present it to the user on view creation. I haven't looked at these projects but it would be great to integrate them as so often in Drupal duplication dilutes the efforts of making great modules. I am "trying" to push more updates into Apache Solr Views at the moment as I am aware we've dropped the ball on that project in terms of progression.

"I wonder if the Apache Solr Multisite module might benefit from having a pull-down option for selecting a schema type? D6, D7, Nutch, Solr etc ?"

If you are not using views this makes sense as we would need to alter the search and display fields that Solr and Drupal are expecting.

Regards,

Dave

Hi Steve, It's very possible

milesw's picture

Hi Steve,

It's very possible to access non-Drupal Solr servers. The awesome developers of the Apache Solr Integration module designed it to be quite flexible. The framework (apachesolr.module) is schema-agnostic and provides all the necessary interfaces for Solr. The search module (apachesolr_search.module) is what links the Solr framework to Drupal's core search system. So most of what's involved in using an external system is writing a small module to link your Solr server to the Drupal core search system. Using apachesolr_search.module as a template gets you quite far along. As a huge bonus, it's even possible to build Facet API facets with minimal development.

If importing the data is straightforward enough, that sounds like a better approach. You'd automatically be able to use all the Solr-related goodies available for Drupal. But if you're talking about a massive amount of data...well, that's up to you. :)

Only one Solr...

stevepurkiss's picture

Hi Miles, thanks for your reply!

There would only be one instance of Solr which would need to index Drupal as well as the external system, which just feels wrong to me, and as you say, importing I'd get all the Solr-goodies easily.

Plus as usual they want the moon-on-a-stick-by-yesterday so I think developing lots of custom modules when I could just use something like the also-awesome feeds module would be a far better approach - depending as you say on amount of data.

Apache Solr Views in D6 and D7

niccolox's picture

gday steve,

have you checked the contrib ApacheSolr Views module in D6?
http://drupal.org/project/apachesolr_views

an earlier tutorial by Robert Douglas
http://acquia.com/blog/views-3-apache-solr-acquia-drupal-future-search
(note, I think you will find that a git clone will give you code that doesnt allow this tutorial to work, but the dev release does work)

as another earlier comment in this thread, you will find a github module that allows you to do a Solr search from Drupal for already existing indexes..
https://github.com/markbirbeck/apachesolr_views
https://github.com/markbirbeck/apachesolr_passthru

i.e. theoretically the Guardians Solr based API (although I think its JSON based, not XML, and so am unsure if it will work now, but soon)
http://www.guardian.co.uk/open-platform/blog/what-is-powering-the-conten...

btw, Dave Stuart has Nutch > Solr > Apache Solr Search Integration > Apache Solr Views > Drupal Views working in a demo and in production sites
http://www.archive.org/details/HowToBuildAJobsAggregationSearchEngineWit...

personally, I've managed to get from Nutch > Solr > Apache Solr Search Integration .. but my dev toolchain is breaking down at that point i.e. no Nutch derived Solr Views (yet)

obviously, in D7, Apache Solr Views are in-built to the Apache Solr Search Integration module... i.e. Views can be treated like Nodes/Entities

as the Guardian puts it, from Publisher to Platform...

I think you will find the greatest hurdle is finding the versions and configurations that work, its pretty poorly documented, but this toolchain is the future

ps: wouldn't be cool if Drupal Views could do Solr as easily as like RSS

Sandbox project

gaëlg's picture

http://drupal.org/sandbox/gaelg/1242930

Yep, it's probably more work than importing. But if you want your site to stay up-to-date you will have to do regular re-imports of the external database, or handle remote modifications.

Re: Sandbox project

stevepurkiss's picture

Thanks!

Solr and database

velebak's picture

I'm working on a similar integration myself. I have a MySQL database that drives our newspaper editorial workflow in a proprietary system, and want to hook that into a D7 site here.

No matter what, if they plan to make and changes to the existing database you need to be able to handle them and keep Solr updated.

I can't see a solution that doesn't require you to follow a path similar to what milesw suggests, unless you're doing a full-on db conversion and migration.

Lucene, Nutch and Solr

Group organizers

Group categories

Projects

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: