I'm using Nutch 1.6 to retrieve metadata from crawled documents (e.g. .doc,
.ppt, .pdf, etc.) for indexing by Solr 4.0. Several of the crawled files
have no value or a junk value for certain metatags. Is there a way to force
Solr to skip indexing of documents where, say metatag.title is empty or
me
Jack, thanks for the response. So, adding something as simple as the
following to the processAdd() function should do the trick in your opinion?
this_title = doc.getFieldValue("title");
if (this_title == "Slide 1"){
return false;
}
Regards,
ADS
--
View