nutch

smartcoder's picture

Easy Integration on SOLR-Nutch-Drupal

Hello,

Can anybody point me to a step-by-step guide on combining SOLR-NUTCH-DRUPAL together?

What I am trying to achieve is crawl some data from various websites and create a comparison platform for them. Any help will be much appreciated.

Many Thanks in advance.

Shashank

niccolo's picture

Big Data Drupal with Cloudera, Hadoop, MapReduce, Nutch and Solr

thanks to the recent work of the Solr Nutch sandbox project I've managed to get Nutch 1.6 jobs to run on a Cloudera CDH3 4 node cluster sending results to Solr 3.6.2 (hosted within Tomcat on Aegir BOA) and then integrated into the Apache Solr 7.1.1 module (not the dev) into search results and Apache Solr Views

I must say, I am pretty excited about Hadoop / Cloudera running Nutch and Solr and integrating with Drupal

for anyone interested in setting up a Cloudera cluster I recommend masterschema (centos) and Gregory Grubbs on YouTube (debian)

I'll post some notes etc ASAP

Read more
cilefen's picture

Solr Nutch Search Sandbox Project Added

Hi All,

I just added "Solr Nutch Search", a sandbox project.

http://drupal.org/sandbox/cilefen/1858412

I welcome your feedback. Let me know if it is good enough for a full project, in which case I could use a co-maintainer.

-Chris McCafferty

niccolo's picture

Nutch 2.1, Solr 4.0 etc

the latest version of Nutch 2.1 seems to work quite nicely with Solr 4.0 and am wondering if others have tried sending results to Search API and / or Apache Solr Search Drupal modules ?

there are lots of possibilities with integrating web-crawls into Drupal views, searches etc

Nutch 2.1 / Solr 4.0 (Gora+Mysql) running using this tutorial
http://nlp.solutions.asia/?p=180

Nutch 2.1 + Aegir BOA?
http://drupal.org/node/1851318

Drupal Nutch module and 2.1?
http://drupal.org/node/1851324

Drupal Elastic Search module and Nutch
http://drupal.org/node/1851064

Read more
mattp52's picture

Senior Drupal/PHP Developer - Wellington, New Zealand | HeadFirst Limited

Employment type: 
Full time
Telecommute: 
Not allowed

HeadFirst are a web development company based in Wellington, New Zealand. We provide enterprise-grade Drupal-based solutions for public sector, corporate and select start-ups. We are looking for a full time Senior Developer/Technical Lead proficient with PHP and Drupal. This is a hands on position - you will be responsible for leading a team of 3-4 developers building Drupal solutions for our clients.

Read more
niccolo's picture

Stanford Drupal Camp - OpenScholar session

my presentation on OpenScholar at Merritt College, Oakland has been accepted for the Stanford DrupalCamp

any suggestions for topics, formats or co-presenters welcome

https://drupalcamp.stanford.edu/sessions/drupal-openscholar-solr-aegir-n...

thanks

maxmmize's picture

Nutch Urlfilter | Eric

Employment type: 
Contract
Telecommute: 
Allowed

I have a fairly complex url filter that needs to be created and I just don't have the time to figure it out. I am looking for a someone who can develop a good urlfilter for one specific site.

It's a very small task, I know. But, it needs to be done right and I can learn a lot from getting one done professionally.

Again:

1 URL Filter for 1 specific site.
The URL filter I can imagine is fairly complex. (At least in my eyes)

Payment is per agreement.

agatlin's picture

Newbie SOLR Questions

We are hoping to use SOLR in a couple of non-standard implementations, and I just have a few questions.

  1. If we want to index all of the documents in a specific file directory (e.g. TIFF images of scanned documents), can we do this directly with SOLR, or do we need Nutch? (I realize TIFFS are quite legacy but in this instance conversion to PDF is not an option.)

  2. If there are specific documents on a remote site we want to index with SOLR (again, specific TIFF documents), what is the best way to accomplish this. (We have the specific URLs for these documents.)

Read more
mikas's picture

Senior Solutions Architect required | CNS

Employment type: 
Contract
Telecommute: 
Not allowed

Senior Web solutions architect required!

Based in Reading, Berkshire - a 5 minute walk from Reading station and just a 25 minute commute from Paddington London, this is a great opportunity to work with other experts on this large award winning Drupal site.

Are you an expert in designing scalable web solutions using most of the following technologies:

PHP PYTHON MySQL LUCENE SOLR APACHE TOMCAT MEMCACHED

Do you have experience with: Content indexing, Clustering, Taxonomies and Ontologies.

Are you passionate about open source?

Read more
pearlbear's picture

Anyone working on the nutch module for 6?

Hi there,

Anyone working on getting the nutch module working for Drupal 6? Any folks know of other avenues to get full-text document search (.pdf, .doc, etc.) in Drupal 6?

Thanks!

Subscribe with RSS Syndicate content