Big Data Drupal: Cloudera Hadoop, MapReduce, Nutch, Solr, Aegir BOA, Drupal 7 ApacheSolr Views
I am giving a talk at Badcamp on Big Data Drupal: Cloudera Hadoop, MapReduce, Nutch, Solr, Aegir BOA, Drupal 7 ApacheSolr Views
http://2013.badcamp.net/sessions/big-data-drupal-cloudera-hadoop-mapredu...
I am trying to gather some other experts i.e. Cloudera / Hadoop / MapReduce + HyperDrupal + Twig etc to come and handle the bigger and deeper questions
https://drupal.org/node/2104503
https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/Uwuj1q7bWBY
Solr Nutch Sandbox Modified to a simple search interface
Dear Lucene, Nutch, and Solr and my NJ Colleagues:
The Solr Nutch project has been changed to a simple search interface that does one thing--search Solr indexes that were crawled by Nutch, using Nutch's preferred schema. Please give it a try if you use Nutch.
http://drupal.org/sandbox/cilefen/1858412
Read moreEasy Integration on SOLR-Nutch-Drupal
Hello,
Can anybody point me to a step-by-step guide on combining SOLR-NUTCH-DRUPAL together?
What I am trying to achieve is crawl some data from various websites and create a comparison platform for them. Any help will be much appreciated.
Many Thanks in advance.
Shashank
Read moreBig Data Drupal with Cloudera, Hadoop, MapReduce, Nutch and Solr
thanks to the recent work of the Solr Nutch sandbox project I've managed to get Nutch 1.6 jobs to run on a Cloudera CDH3 4 node cluster sending results to Solr 3.6.2 (hosted within Tomcat on Aegir BOA) and then integrated into the Apache Solr 7.1.1 module (not the dev) into search results and Apache Solr Views
I must say, I am pretty excited about Hadoop / Cloudera running Nutch and Solr and integrating with Drupal
for anyone interested in setting up a Cloudera cluster I recommend masterschema (centos) and Gregory Grubbs on YouTube (debian)
I'll post some notes etc ASAP
Read moreSolr Nutch Search Sandbox Project Updated to Integrate with Common Schema
Hello all:
Based on our discussion last month on IRC, I reconfigured this sandbox project as a few Nutch settings that creates an index compatible with the common schema for the apachesolr module.
http://drupal.org/sandbox/cilefen/1858412
The purpose is ad-hoc crawling and indexing, but searching within Drupal and the results are integrated with the Drupal node results.
This is for Nutch 1.x only at this stage.
Read moreSolr Nutch Search Sandbox Project Added
Hi All,
I just added "Solr Nutch Search", a sandbox project.
http://drupal.org/sandbox/cilefen/1858412
I welcome your feedback. Let me know if it is good enough for a full project, in which case I could use a co-maintainer.
-Chris McCafferty
Read moreNutch 2.1, Solr 4.0 etc
the latest version of Nutch 2.1 seems to work quite nicely with Solr 4.0 and am wondering if others have tried sending results to Search API and / or Apache Solr Search Drupal modules ?
there are lots of possibilities with integrating web-crawls into Drupal views, searches etc
Nutch 2.1 / Solr 4.0 (Gora+Mysql) running using this tutorial
http://nlp.solutions.asia/?p=180
Nutch 2.1 + Aegir BOA?
http://drupal.org/node/1851318
Drupal Nutch module and 2.1?
http://drupal.org/node/1851324
Drupal Elastic Search module and Nutch
http://drupal.org/node/1851064
Desiring help with Solr, Nutch, Facet API
Mathematic Arts is a Drupal development firm in Milwaukee. We recently developed a web site for a research library in Drupal 7, and implemented Solr and Nutch for the search facility. We are using Facet API to filter search results based on a few simple criteria, but would like to do some more complicated filters and to improve the user experience.
For example:
- Have a facet like content type, but that aggregates many of the general content types that are meaningless to a user. For example:
Parsing Views Feeds
I am new to Nutch, and am attempting to parse a site that has two Views blocks on the front page, both also providing feeds.
My first attempt to parse resulted in the following error:
parser not found for contentType=application/xhtml+xml
I attempted to fix this by editing conf/parse-plugins.xml, where I added:
Now, when I attempt to parse, I get the following:
Read moreInstall and Configure Nutch in 5 minutes
Ok, here we go. This information is only relevant to those wishing to start out with Nutch for the first time or developers who test various Nutch functions and have to tear down and setup to confirm results. There are many ways to do it but this works for me. Also, there are scripts here. So, run them t your own risk. If you don't know, ask.
1.) Login to your server using ssh and create a dir named /stuff
a.) mkdir /stuff
b.) touch nutch.sh
c.) vi nutch.sh




