from:"\"stone2dbone\""

Skip Indexing Certain Files on Purpose

2013-08-01 Thread stone2dbone

I'm using Nutch 1.6 to retrieve metadata from crawled documents (e.g. .doc, .ppt, .pdf, etc.) for indexing by Solr 4.0. Several of the crawled files have no value or a junk value for certain metatags. Is there a way to force Solr to skip indexing of documents where, say metatag.title is empty or me

Re: Skip Indexing Certain Files on Purpose

2013-08-02 Thread stone2dbone

Jack, thanks for the response. So, adding something as simple as the following to the processAdd() function should do the trick in your opinion? this_title = doc.getFieldValue("title"); if (this_title == "Slide 1"){ return false; } Regards, ADS -- View