Does your Solr schema match the data output by nutch? It’s up to you to create 
a Solr schema that matches the output of nutch – read up on the nutch doc for 
that info. Solr doesn’t define that info, nutch does.

-- Jack Krupansky

From: Xavier Morera 
Sent: Thursday, April 10, 2014 12:58 PM
To: solr-user@lucene.apache.org 
Subject: Pushing content to Solr from Nutch

Hi, 

I have followed several Nutch tutorials - including the main one 
http://wiki.apache.org/nutch/NutchTutorial - to crawl sites (which works, I can 
see in the console as the pages get crawled and the directories built with the 
data) but for the life of me I can't get anything posted to Solr. The Solr 
console doesn't even squint, therefore Nutch is not sending anything.

This is the command that I send over that crawls and in theory should also post
bin/crawl urls/seed.txt TestCrawl http://localhost:8983/solr 2


But I found that I could also use this one when it is already crawled
bin/nutch solrindex http://localhost:8983/solr crawl/crawldb crawl/linkdb 
crawl/segments/*


But no luck.

This is the only thing that called my attention but I read that by adding the 
property below would work but doesn't work.
No IndexWriters activated - check your configuration


This is the property
<property>
<name>plugin.includes</name>
<value>protocol-http|urlfilter-regex|parse-html|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
</property>

Any idea? Apache Nutch 1.8 running Java 1.6 via Cygwin on Windows.

-- 

Xavier Morera
email: xav...@familiamorera.com

CR: +(506) 8849 8866
US: +1 (305) 600 4919 
skype: xmorera

Reply via email to