nutch

Events happening in the community are now at Drupal community events on www.drupal.org.
niccolox's picture

RFC: Future of Big Data Drupal .COM and .ORG domains etc

<

p>At Badcamp 2013 in Berkeley I presented to a full-room on Big Data Drupal: Cloudera Hadoop

Read more
niccolox's picture

Big Data Drupal: Cloudera Hadoop, MapReduce, Nutch, Solr, Aegir BOA, Drupal 7 ApacheSolr Views

I am giving a talk at Badcamp on Big Data Drupal: Cloudera Hadoop, MapReduce, Nutch, Solr, Aegir BOA, Drupal 7 ApacheSolr Views

http://2013.badcamp.net/sessions/big-data-drupal-cloudera-hadoop-mapredu...

I am trying to gather some other experts i.e. Cloudera / Hadoop / MapReduce + HyperDrupal + Twig etc to come and handle the bigger and deeper questions

https://drupal.org/node/2104503
https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/Uwuj1q7bWBY

Read more
smartcoder's picture

Easy Integration on SOLR-Nutch-Drupal

Hello,

Can anybody point me to a step-by-step guide on combining SOLR-NUTCH-DRUPAL together?

What I am trying to achieve is crawl some data from various websites and create a comparison platform for them. Any help will be much appreciated.

Many Thanks in advance.

Shashank

Read more
niccolox's picture

Big Data Drupal with Cloudera, Hadoop, MapReduce, Nutch and Solr

thanks to the recent work of the Solr Nutch sandbox project I've managed to get Nutch 1.6 jobs to run on a Cloudera CDH3 4 node cluster sending results to Solr 3.6.2 (hosted within Tomcat on Aegir BOA) and then integrated into the Apache Solr 7.1.1 module (not the dev) into search results and Apache Solr Views

I must say, I am pretty excited about Hadoop / Cloudera running Nutch and Solr and integrating with Drupal

for anyone interested in setting up a Cloudera cluster I recommend masterschema (centos) and Gregory Grubbs on YouTube (debian)

I'll post some notes etc ASAP

Read more
cilefen's picture

Solr Nutch Search Sandbox Project Added

Hi All,

I just added "Solr Nutch Search", a sandbox project.

http://drupal.org/sandbox/cilefen/1858412

I welcome your feedback. Let me know if it is good enough for a full project, in which case I could use a co-maintainer.

-Chris McCafferty

Read more
niccolox's picture

Nutch 2.1, Solr 4.0 etc

the latest version of Nutch 2.1 seems to work quite nicely with Solr 4.0 and am wondering if others have tried sending results to Search API and / or Apache Solr Search Drupal modules ?

there are lots of possibilities with integrating web-crawls into Drupal views, searches etc

Nutch 2.1 / Solr 4.0 (Gora+Mysql) running using this tutorial
http://nlp.solutions.asia/?p=180

Nutch 2.1 + Aegir BOA?
http://drupal.org/node/1851318

Drupal Nutch module and 2.1?
http://drupal.org/node/1851324

Drupal Elastic Search module and Nutch
http://drupal.org/node/1851064

Read more
niccolox's picture

Stanford Drupal Camp - OpenScholar session

my presentation on OpenScholar at Merritt College, Oakland has been accepted for the Stanford DrupalCamp

any suggestions for topics, formats or co-presenters welcome

https://drupalcamp.stanford.edu/sessions/drupal-openscholar-solr-aegir-n...

thanks

Read more
agatlin's picture

Newbie SOLR Questions

We are hoping to use SOLR in a couple of non-standard implementations, and I just have a few questions.

  1. If we want to index all of the documents in a specific file directory (e.g. TIFF images of scanned documents), can we do this directly with SOLR, or do we need Nutch? (I realize TIFFS are quite legacy but in this instance conversion to PDF is not an option.)

  2. If there are specific documents on a remote site we want to index with SOLR (again, specific TIFF documents), what is the best way to accomplish this. (We have the specific URLs for these documents.)

Read more
Anonymous's picture

Anyone working on the nutch module for 6?

Hi there,

Anyone working on getting the nutch module working for Drupal 6? Any folks know of other avenues to get full-text document search (.pdf, .doc, etc.) in Drupal 6?

Thanks!

Read more
Subscribe with RSS Syndicate content