Re: solr nutch url indexing

Uri Boness Mon, 24 Aug 2009 13:42:51 -0700

How did you configure nutch?

Make sure you have the "parse-html" and "index-basic" configured. TheHtmlParser should by default extract the page title and add to theparsed data, and the BasicIndexingFilter by default adds this title tothe NutchDocument and stores it in the "title" filed. All the SolrIndex(actually the SolrWriter) does is converting the NuchDocument to aSolrInputDocument. So having these plugins configured in Nutch andhaving a field in the schema named "title" should work. (I'm assumingyou're using the "solrindex" tool)


Cheers,
Uri

Lassalle, Thibaut wrote:

Hi,

I would like to crawl intranets with nutch and index them with solr.

I would like to search mostly on the title of the pages (the one in
<title>This is a title</title>)

I tried to tweak the schema.xml to do that but nothing is working. I
just have the content indexed.

How do I index on title ?

Thanks

t.

Re: solr nutch url indexing

Reply via email to