Hi Tony,
Strangely I started looking into the Solr/Nutch integration yesterday so I might be able to help :)

The documentation for it is very sparse, but the trunk of nutch does have the solr integration committed.
If I remember correctly, what I had to do was...

I went through one of the nutch setup guides and set it up as if I wasn't going to use solr. (Can't remember which one, sorry).

Copy the crawl script from here: http://www.foofactory.fi/files/nutch-solr/crawl.sh into my nutch directory. I was running this under the soy-latte JVM on OSX, and I had to modify the crawler a little to pick up filenames instead of permissions strings:
This line was changed (note the 'cut' command)
SEGMENT=`bin/hadoop dfs -ls $BASEDIR/segments|grep $BASEDIR|cut -d\ - f17|sort|tail -1` I also changed the second to last line to match the required parameters for the new solr indexer: bin/nutch org.apache.nutch.indexer.solr.SolrIndexer http://localhost:8983/solr/ $BASEDIR/crawldb $BASEDIR/linkdb $SEGMENT

Copy the schema.xml from the nutch config directory into a fresh solr install & start it up. run the crawler.sh, and you should end up with content in your solr instance.

I probably wont' be able to answer many nutch-related questions, but that's how I managed to get it up and running.

Toby.

On 6 Mar 2009, at 11:27, Andrzej Bialecki wrote:

Tony Wang wrote:
Hi Hoss,
But I cannot find documents about the integration of Nutch and Solr in
anywhere. Could you give me some clue? thanks

Tony, I suggest that you follow Hoss's advice and ask these questions on nutch-user. This integration is built into Nutch, and not Solr, so it's less likely that people on this list know what you are talking about.

This integration is quite fresh, too, so there are almost no docs except on the mailing list. Eventually someone is going to create some docs, and if you keep asking questions on nutch-user you will contribute to the creation of such docs ;)


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Toby Cole
Software Engineer

Semantico
Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
T: +44 (0)1273 358 238
F: +44 (0)1273 723 232
E: toby.c...@semantico.com
W: www.semantico.com

Reply via email to