I modified TikaEntityProcessor to ignore these exceptions.:

If the Tika Entity processor encounters an exception it will stop indexing.
I had to make two fixes to TikaEntityProcessor to work around this problem.

>From the Solr SVN trunk edit the file:

~/src/solr-svn/trunk/solr/contrib/dataimporthandler/src/extras/main/java/org/apache/solr/handler/dataimport/TikaEntityProcessor.jar

First of all if a file is not found on the disk we want to continue
indexing. At the top of nextRow() add

File f = new File (context.getResolvedEntityAttribute(URL));
if (! f.exists()) {
  return null;
}

Secondly if the document parser throws an error, for example certain PDF
revisions can cause the PDFBox parser to barf, we will trap the exception
and continue:

try {
  tikaParser.parse(is, contentHandler, metadata , new ParseContext());
} catch (Exception e) {
  return null;
} finally {
  IOUtils.closeQuietly(is);
}

We will also close IOUtils in the finally section which is not done in the
original code. Build and deploy the extras.jar in the solr-instance/lib
directory. 

see also: http://www.abcseo.com/tech/search/solr-and-liferay-integration
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Resume-Solr-indexing-CSV-after-exception-tp878801p888143.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to