I modified TikaEntityProcessor to ignore these exceptions.: If the Tika Entity processor encounters an exception it will stop indexing. I had to make two fixes to TikaEntityProcessor to work around this problem.
>From the Solr SVN trunk edit the file: ~/src/solr-svn/trunk/solr/contrib/dataimporthandler/src/extras/main/java/org/apache/solr/handler/dataimport/TikaEntityProcessor.jar First of all if a file is not found on the disk we want to continue indexing. At the top of nextRow() add File f = new File (context.getResolvedEntityAttribute(URL)); if (! f.exists()) { return null; } Secondly if the document parser throws an error, for example certain PDF revisions can cause the PDFBox parser to barf, we will trap the exception and continue: try { tikaParser.parse(is, contentHandler, metadata , new ParseContext()); } catch (Exception e) { return null; } finally { IOUtils.closeQuietly(is); } We will also close IOUtils in the finally section which is not done in the original code. Build and deploy the extras.jar in the solr-instance/lib directory. see also: http://www.abcseo.com/tech/search/solr-and-liferay-integration -- View this message in context: http://lucene.472066.n3.nabble.com/Resume-Solr-indexing-CSV-after-exception-tp878801p888143.html Sent from the Solr - User mailing list archive at Nabble.com.