Is SolrIndex plugin for Nutch? 
Thanks!

-----Original Message-----
From: Uri Boness [mailto:ubon...@gmail.com] 
Sent: August-24-09 4:42 PM
To: solr-user@lucene.apache.org
Subject: Re: solr nutch url indexing

How did you configure nutch?

Make sure you have the "parse-html" and "index-basic" configured. The 
HtmlParser should by default extract the page title and add to the 
parsed data, and the BasicIndexingFilter by default adds this title to 
the NutchDocument and stores it in the "title" filed. All the SolrIndex 
(actually the SolrWriter) does is converting the NuchDocument to a 
SolrInputDocument. So having these plugins configured in Nutch and 
having a field in the schema named "title" should work. (I'm assuming 
you're using the "solrindex" tool)

Cheers,
Uri

Lassalle, Thibaut wrote:
> Hi,
>
>  
>
> I would like to crawl intranets with nutch and index them with solr.
>
>  
>
> I would like to search mostly on the title of the pages (the one in
> <title>This is a title</title>)
>
>  
>
> I tried to tweak the schema.xml to do that but nothing is working. I
> just have the content indexed.
>
>  
>
> How do I index on title ?
>
>  
>
> Thanks
>
> t.
>
>
>   


Reply via email to