Re: Resume Solr indexing CSV after exception

Brad Greenlee Fri, 11 Jun 2010 06:43:01 -0700

Why not just use the onError attribute on entity? The default is to abort,
but you can also specify "skip" to skip the current document, or "continue"
to continue as if the error never happened. See
http://wiki.apache.org/solr/DataImportHandler#Schema_for_the_data_config


Brad

On Fri, Jun 11, 2010 at 3:00 AM, David George <david.geo...@gmail.com>wrote:

>
> I modified TikaEntityProcessor to ignore these exceptions.:
>
> If the Tika Entity processor encounters an exception it will stop indexing.
> I had to make two fixes to TikaEntityProcessor to work around this problem.
>
> From the Solr SVN trunk edit the file:
>
>
> ~/src/solr-svn/trunk/solr/contrib/dataimporthandler/src/extras/main/java/org/apache/solr/handler/dataimport/TikaEntityProcessor.jar
>
> First of all if a file is not found on the disk we want to continue
> indexing. At the top of nextRow() add
>
> File f = new File (context.getResolvedEntityAttribute(URL));
> if (! f.exists()) {
>  return null;
> }
>
> Secondly if the document parser throws an error, for example certain PDF
> revisions can cause the PDFBox parser to barf, we will trap the exception
> and continue:
>
> try {
>  tikaParser.parse(is, contentHandler, metadata , new ParseContext());
> } catch (Exception e) {
>  return null;
> } finally {
>  IOUtils.closeQuietly(is);
> }
>
> We will also close IOUtils in the finally section which is not done in the
> original code. Build and deploy the extras.jar in the solr-instance/lib
> directory.
>
> see also: http://www.abcseo.com/tech/search/solr-and-liferay-integration
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Resume-Solr-indexing-CSV-after-exception-tp878801p888143.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Resume Solr indexing CSV after exception

Reply via email to