ApacheSolr
Optimizing Apachesolr for non-english languages
I have had a lot of research about optimizing Apachesolr for non-english languages, especially for German. It comes out, that there search results can be dramatically improved by adjusting Solr's stemming and by breaking up compound words. This can be easily achieved with slight changes to Apachesolr's schema.xml.
This post is about configuring stemming:
http://www.early-dance.de/en/news/9188-optimizing-apachesolr-non-english...
And this post is about compound word splitting, that is needed in languages like German that have long combined words like "Dampfschifffahrt":
http://www.early-dance.de/en/news/9189-apachesolr-issues-german-and-othe...
Project Mercury Alpha 6: Now With Solr!
I'm happy to announce the 0.6 Alpha release of the Mercury AMI, now including ApacheSolr as the search backend! This is the last piece of major infrastructure we want to integrate into the stack for scalability purposes. You can now move from a single-server install based on Mercury to a best-practice vertically scaled architecture with separate hardware to run front-end cache, application, back-end cache, search and database!
The quickest way to find it is by searching Amazon EC for "Pantheon" or "Mercury". The manifest path for the latest release (in 32bit and 64 bit flavors) is:
chapter3-storage/PANTHEON-pressflow-mercury-alpha-6.manifest.xmlchapter3-storage/PANTHEON-pressflow-mercury64-alpha-6.2.manifest.xml(back!)
If you'd like to "roll your own" we've updated the wiki instructions page with a new set of instructions for getting Solr up and running as part of the process. Feel free to improve that documentation, as it's definitely a community process.
This will likely be one of the last releases before we move the project into the Beta phase, at which point we'll be focusing on fine tuning and stability as well as portabilty onto non EC2 systems moreso than new features. If you have ideas for additional things you'd like to see integrated in the stack, please chime in. We're also going to be documenting real-world "how to" use-cases — e.g. "how do I put my existing site on Mercury" in user-friendly detail — so stay tuned for that.
As always, let us know what you think of the release, what you'd like to see in future iterations, and how your experience is in using the stack. There's plenty more to come.
Architecture question re: huge indexes
I intend to have a Drupal 6 site with some cutom node types indexed by ApacheSolr using the XML Schema provided. However, I'd like to add a lot of gigantic indexes to it that reside only in the Lucene/Solr system (ha) and not physically in my Drupal MySQL database. For example, I may have a thousand or even a hundred thousand nodes in Drupal, but I might have ten different "external" indexes with millions of records each, and I'd like to conduct faceted search against the whole lot of 'em.
Apachesolr and Stemming in other languages
When i perform searches for tools and for tool with Apachesolr, i get the same results. As far as i know, the reason is Apachesolr's stemmer, that reduces the word tools to tool and uses it for the search within the index. As far as i know, the stemmer is only aware of English stemming rules.
So how are things going, when using Apachesolr with different languages? For example, i tried the German word Kunden (plural of clients) and Kunde (singular of client) and i get different result sets. -- Obviously, the stemmer doesn't know the German stemming rules ... right?
How can the search results be improved for languages different from English? Are there German stemmers available to plug into Apachesolr?
RDF for Solr: Possible improvements
The Apache Solr RDF module is now in a state, where it can already, theoretically, be used. However, there is much room for improvement, so I'd like to discuss some possible ways to do this.
RDF for Solr: Possible implementation strategies
(For information about my project, see here. Put shortly, it's about enabling Solr to index RDF data via drupal.)
Before starting the actual coding, even on prototypes, the basic options for implementing this will have to be discussed. At the moment, my mentors and I see the following three possibilities:
Multisite Search using ApacheSolr module
Hi,
Can anyone let me know if it is possible to index and search multiple Drupal and non-drupal websites using the ApacheSolr module?
If not please let me know of any other way that this could be achieved.
Thanks
Solr RDF Support
Overview
This project is about adding RDF Support to the popular ApacheSolr module in the form of a Solr RDF contrib module. The module should be able to read an RDF class specification and automatically generate the necessary mapping to a Solr server, provide the capability to search resources with that type and also generate facets based on its properties. It would even be possible to build the existing Node search capabilites completely on top of this mechanism! But in any case you could also add arbitrary other types like users or taxonomy terms, or resources from other websites altogether.
Help backport ApacheSolr D6 to D5
Hi Everyone,
Today I took a couple hours and attempted a backport of ApacheSolr D6 to D5. This will bring all of the cool features that D6 has to D5. The patch needs work. It doesn't yet work. If anyone has time to chip in and work on the indexing of nodes (currently not working) that'd be great. With a little momentum I think we can do the backport relatively quickly, after which keeping the two in sync will be much easier.
Using Apachesolr module as an API
Just want to share my experience with using the apachesolr module as an API. I use apachesolr module purely as interface between my own drupal modules and the Solr instance. Part of the reason to do so is that I use a different schema.xml, compared to the one provided with the apachesolr module.
For the single technical detail, read this issue.
Thanks a lot to the developers for creating the apachesolr module and making it nicely extensible!





