How did you configure nutch?

Make sure you have the "parse-html" and "index-basic" configured. The HtmlParser should by default extract the page title and add to the parsed data, and the BasicIndexingFilter by default adds this title to the NutchDocument and stores it in the "title" filed. All the SolrIndex (actually the SolrWriter) does is converting the NuchDocument to a SolrInputDocument. So having these plugins configured in Nutch and having a field in the schema named "title" should work. (I'm assuming you're using the "solrindex" tool)

Cheers,
Uri

Lassalle, Thibaut wrote:
Hi,

I would like to crawl intranets with nutch and index them with solr.

I would like to search mostly on the title of the pages (the one in
<title>This is a title</title>)

I tried to tweak the schema.xml to do that but nothing is working. I
just have the content indexed.

How do I index on title ?

Thanks

t.


Reply via email to