I'd file a JIRA issue.

On Nov 12, 2011, at 10:39 AM, David T. Webb wrote:

> Same result on onError="continue" .
> 
> Any help is appreciated....thank you.
> 
> --
> Sincerely,
> David Webb
> 
> 
> 
> -----Original Message-----
> From: David T. Webb [mailto:david.w...@brightmove.com] 
> Sent: Saturday, November 12, 2011 10:27 AM
> To: solr-user@lucene.apache.org
> Subject: RE: TikaEntityProcesor Exception Handling
> 
> I found the answer with the onError="skip" on the Entity,  However,
> after adding that parameter to the data-config.xml, the index processing
> still stops when the TikaEntityProcessor throws an Exception.
> 
> Nov 12, 2011 10:22:16 AM org.apache.solr.common.SolrException log
> SEVERE: Full Import
> failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
> Unable to read content Processing Document # 562
>        at
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThr
> ow(DataImportHandlerException.java:72)
>        at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
> yProcessor.java:130)
>        at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
> ProcessorWrapper.java:238)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:596)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:622)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:622)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
> :268)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
> 7)
>        at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
> r.java:359)
>        at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
> :427)
>        at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:
> 408)
> Caused by: org.apache.tika.exception.TikaException: Unexpected
> RuntimeException from org.apache.tika.parser.ParserDecorator$1@8a799a
>        at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
>        at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137)
>        at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
> yProcessor.java:128)
>        ... 9 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 29
>        at
> org.apache.poi.hwpf.model.StyleSheet.getCharacterStyle(StyleSheet.java:3
> 15)
>        at
> org.apache.poi.hwpf.model.CHPX.getCharacterProperties(CHPX.java:60)
>        at
> org.apache.poi.hwpf.usermodel.CharacterRun.<init>(CharacterRun.java:98)
>        at
> org.apache.poi.hwpf.usermodel.Range.getCharacterRun(Range.java:797)
>        at
> org.apache.poi.hwpf.model.PicturesTable.getAllPictures(PicturesTable.jav
> a:191)
>        at
> org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(Wor
> dExtractor.java:429)
>        at
> org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(Wor
> dExtractor.java:419)
>        at
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:
> 75)
>        at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:18
> 7)
>        at
> org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
>        at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
>        ... 11 more
> 
> Nov 12, 2011 10:22:16 AM org.apache.solr.update.DirectUpdateHandler2
> rollback
> INFO: start rollback
> Nov 12, 2011 10:22:16 AM org.apache.solr.update.DirectUpdateHandler2
> rollback
> INFO: end_rollback
> --
> Sincerely,
> David Webb
> 
> 
> 
> -----Original Message-----
> From: David T. Webb [mailto:david.w...@brightmove.com]
> Sent: Saturday, November 12, 2011 10:08 AM
> To: solr-user@lucene.apache.org
> Subject: TikaEntityProcesor Exception Handling
> 
> When indexing over 2MM documents with Solr and the TikaEntityProcessor,
> the indexing fails if Tika encounters an exception with one of the
> documents.  How can I tell Solr to keep going and just ignore the failed
> documents from the Tika Processor?
> 
> 
> 
> Thanks.
> 
> 
> 
> --
> 
> Sincerely,
> 
> David Webb
> 

- Mark Miller
lucidimagination.com











Reply via email to