Thank you all so much! I sincerely appreciate the help received. Tony

On Fri, Mar 6, 2009 at 5:02 AM, Toby Cole <toby.c...@semantico.com> wrote:

> Hi Tony,
>        Strangely I started looking into the Solr/Nutch integration
> yesterday so I might be able to help :)
>
> The documentation for it is very sparse, but the trunk of nutch does have
> the solr integration committed.
> If I remember correctly, what I had to do was...
>
> I went through one of the nutch setup guides and set it up as if I wasn't
> going to use solr. (Can't remember which one, sorry).
>
> Copy the crawl script from here:
> http://www.foofactory.fi/files/nutch-solr/crawl.sh into my nutch
> directory.
> I was running this under the soy-latte JVM on OSX, and I had to modify the
> crawler a little to pick up filenames instead of permissions strings:
> This line was changed (note the 'cut' command)
>        SEGMENT=`bin/hadoop dfs -ls $BASEDIR/segments|grep $BASEDIR|cut -d\
> -f17|sort|tail -1`
> I also changed the second to last line to match the required parameters for
> the new solr indexer:
>        bin/nutch org.apache.nutch.indexer.solr.SolrIndexer
> http://localhost:8983/solr/ $BASEDIR/crawldb $BASEDIR/linkdb $SEGMENT
>
> Copy the schema.xml from the nutch config directory into a fresh solr
> install & start it up.
> run the crawler.sh, and you should end up with content in your solr
> instance.
>
> I probably wont' be able to answer many nutch-related questions, but that's
> how I managed to get it up and running.
>
> Toby.
>
>
> On 6 Mar 2009, at 11:27, Andrzej Bialecki wrote:
>
>  Tony Wang wrote:
>>
>>> Hi Hoss,
>>> But I cannot find documents about the integration of Nutch and Solr in
>>> anywhere. Could you give me some clue? thanks
>>>
>>
>> Tony, I suggest that you follow Hoss's advice and ask these questions on
>> nutch-user. This integration is built into Nutch, and not Solr, so it's less
>> likely that people on this list know what you are talking about.
>>
>> This integration is quite fresh, too, so there are almost no docs except
>> on the mailing list. Eventually someone is going to create some docs, and if
>> you keep asking questions on nutch-user you will contribute to the creation
>> of such docs ;)
>>
>>
>> --
>> Best regards,
>> Andrzej Bialecki     <><
>> ___. ___ ___ ___ _ _   __________________________________
>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>> http://www.sigram.com  Contact: info at sigram dot com
>>
>>
> Toby Cole
> Software Engineer
>
> Semantico
> Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
> T: +44 (0)1273 358 238
> F: +44 (0)1273 723 232
> E: toby.c...@semantico.com
> W: www.semantico.com
>
>


-- 
Are you RCholic? www.RCholic.com
温 良 恭 俭 让 仁 义 礼 智 信

Reply via email to