Nutch

Events happening in the community are now at Drupal community events on www.drupal.org.
niccolox's picture

Big Data Drupal: Cloudera Hadoop, MapReduce, Nutch, Solr, Aegir BOA, Drupal 7 ApacheSolr Views

I am giving a talk at Badcamp on Big Data Drupal: Cloudera Hadoop, MapReduce, Nutch, Solr, Aegir BOA, Drupal 7 ApacheSolr Views

http://2013.badcamp.net/sessions/big-data-drupal-cloudera-hadoop-mapredu...

I am trying to gather some other experts i.e. Cloudera / Hadoop / MapReduce + HyperDrupal + Twig etc to come and handle the bigger and deeper questions

https://drupal.org/node/2104503
https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/Uwuj1q7bWBY

Read more
cilefen's picture

Solr Nutch Sandbox Modified to a simple search interface

Dear Lucene, Nutch, and Solr and my NJ Colleagues:

The Solr Nutch project has been changed to a simple search interface that does one thing--search Solr indexes that were crawled by Nutch, using Nutch's preferred schema. Please give it a try if you use Nutch.

http://drupal.org/sandbox/cilefen/1858412

Read more
smartcoder's picture

Easy Integration on SOLR-Nutch-Drupal

Hello,

Can anybody point me to a step-by-step guide on combining SOLR-NUTCH-DRUPAL together?

What I am trying to achieve is crawl some data from various websites and create a comparison platform for them. Any help will be much appreciated.

Many Thanks in advance.

Shashank

Read more
niccolox's picture

Big Data Drupal with Cloudera, Hadoop, MapReduce, Nutch and Solr

thanks to the recent work of the Solr Nutch sandbox project I've managed to get Nutch 1.6 jobs to run on a Cloudera CDH3 4 node cluster sending results to Solr 3.6.2 (hosted within Tomcat on Aegir BOA) and then integrated into the Apache Solr 7.1.1 module (not the dev) into search results and Apache Solr Views

I must say, I am pretty excited about Hadoop / Cloudera running Nutch and Solr and integrating with Drupal

for anyone interested in setting up a Cloudera cluster I recommend masterschema (centos) and Gregory Grubbs on YouTube (debian)

I'll post some notes etc ASAP

Read more
cilefen's picture

Solr Nutch Search Sandbox Project Updated to Integrate with Common Schema

Hello all:

Based on our discussion last month on IRC, I reconfigured this sandbox project as a few Nutch settings that creates an index compatible with the common schema for the apachesolr module.

http://drupal.org/sandbox/cilefen/1858412

The purpose is ad-hoc crawling and indexing, but searching within Drupal and the results are integrated with the Drupal node results.

This is for Nutch 1.x only at this stage.

Read more
cilefen's picture

Solr Nutch Search Sandbox Project Added

Hi All,

I just added "Solr Nutch Search", a sandbox project.

http://drupal.org/sandbox/cilefen/1858412

I welcome your feedback. Let me know if it is good enough for a full project, in which case I could use a co-maintainer.

-Chris McCafferty

Read more
niccolox's picture

Nutch 2.1, Solr 4.0 etc

the latest version of Nutch 2.1 seems to work quite nicely with Solr 4.0 and am wondering if others have tried sending results to Search API and / or Apache Solr Search Drupal modules ?

there are lots of possibilities with integrating web-crawls into Drupal views, searches etc

Nutch 2.1 / Solr 4.0 (Gora+Mysql) running using this tutorial
http://nlp.solutions.asia/?p=180

Nutch 2.1 + Aegir BOA?
http://drupal.org/node/1851318

Drupal Nutch module and 2.1?
http://drupal.org/node/1851324

Drupal Elastic Search module and Nutch
http://drupal.org/node/1851064

Read more
sethhill's picture

Desiring help with Solr, Nutch, Facet API

Mathematic Arts is a Drupal development firm in Milwaukee. We recently developed a web site for a research library in Drupal 7, and implemented Solr and Nutch for the search facility. We are using Facet API to filter search results based on a few simple criteria, but would like to do some more complicated filters and to improve the user experience.

For example:

  • Have a facet like content type, but that aggregates many of the general content types that are meaningless to a user. For example:
Read more
ebeyrent's picture

Parsing Views Feeds

I am new to Nutch, and am attempting to parse a site that has two Views blocks on the front page, both also providing feeds.

My first attempt to parse resulted in the following error:

parser not found for contentType=application/xhtml+xml

I attempted to fix this by editing conf/parse-plugins.xml, where I added:

Now, when I attempt to parse, I get the following:

Read more

Install and Configure Nutch in 5 minutes

Ok, here we go. This information is only relevant to those wishing to start out with Nutch for the first time or developers who test various Nutch functions and have to tear down and setup to confirm results. There are many ways to do it but this works for me. Also, there are scripts here. So, run them t your own risk. If you don't know, ask.

1.) Login to your server using ssh and create a dir named /stuff
a.) mkdir /stuff
b.) touch nutch.sh
c.) vi nutch.sh

Read more
Subscribe with RSS Syndicate content

Lucene, Nutch and Solr

Group organizers

Group categories

Projects

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: