Thank you all so much! I sincerely appreciate the help received. Tony On Fri, Mar 6, 2009 at 5:02 AM, Toby Cole <toby.c...@semantico.com> wrote:
> Hi Tony, > Strangely I started looking into the Solr/Nutch integration > yesterday so I might be able to help :) > > The documentation for it is very sparse, but the trunk of nutch does have > the solr integration committed. > If I remember correctly, what I had to do was... > > I went through one of the nutch setup guides and set it up as if I wasn't > going to use solr. (Can't remember which one, sorry). > > Copy the crawl script from here: > http://www.foofactory.fi/files/nutch-solr/crawl.sh into my nutch > directory. > I was running this under the soy-latte JVM on OSX, and I had to modify the > crawler a little to pick up filenames instead of permissions strings: > This line was changed (note the 'cut' command) > SEGMENT=`bin/hadoop dfs -ls $BASEDIR/segments|grep $BASEDIR|cut -d\ > -f17|sort|tail -1` > I also changed the second to last line to match the required parameters for > the new solr indexer: > bin/nutch org.apache.nutch.indexer.solr.SolrIndexer > http://localhost:8983/solr/ $BASEDIR/crawldb $BASEDIR/linkdb $SEGMENT > > Copy the schema.xml from the nutch config directory into a fresh solr > install & start it up. > run the crawler.sh, and you should end up with content in your solr > instance. > > I probably wont' be able to answer many nutch-related questions, but that's > how I managed to get it up and running. > > Toby. > > > On 6 Mar 2009, at 11:27, Andrzej Bialecki wrote: > > Tony Wang wrote: >> >>> Hi Hoss, >>> But I cannot find documents about the integration of Nutch and Solr in >>> anywhere. Could you give me some clue? thanks >>> >> >> Tony, I suggest that you follow Hoss's advice and ask these questions on >> nutch-user. This integration is built into Nutch, and not Solr, so it's less >> likely that people on this list know what you are talking about. >> >> This integration is quite fresh, too, so there are almost no docs except >> on the mailing list. Eventually someone is going to create some docs, and if >> you keep asking questions on nutch-user you will contribute to the creation >> of such docs ;) >> >> >> -- >> Best regards, >> Andrzej Bialecki <>< >> ___. ___ ___ ___ _ _ __________________________________ >> [__ || __|__/|__||\/| Information Retrieval, Semantic Web >> ___|||__|| \| || | Embedded Unix, System Integration >> http://www.sigram.com Contact: info at sigram dot com >> >> > Toby Cole > Software Engineer > > Semantico > Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE > T: +44 (0)1273 358 238 > F: +44 (0)1273 723 232 > E: toby.c...@semantico.com > W: www.semantico.com > > -- Are you RCholic? www.RCholic.com 温 良 恭 俭 让 仁 义 礼 智 信