I'd file a JIRA issue.
On Nov 12, 2011, at 10:39 AM, David T. Webb wrote: > Same result on onError="continue" . > > Any help is appreciated....thank you. > > -- > Sincerely, > David Webb > > > > -----Original Message----- > From: David T. Webb [mailto:david.w...@brightmove.com] > Sent: Saturday, November 12, 2011 10:27 AM > To: solr-user@lucene.apache.org > Subject: RE: TikaEntityProcesor Exception Handling > > I found the answer with the onError="skip" on the Entity, However, > after adding that parameter to the data-config.xml, the index processing > still stops when the TikaEntityProcessor throws an Exception. > > Nov 12, 2011 10:22:16 AM org.apache.solr.common.SolrException log > SEVERE: Full Import > failed:org.apache.solr.handler.dataimport.DataImportHandlerException: > Unable to read content Processing Document # 562 > at > org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThr > ow(DataImportHandlerException.java:72) > at > org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit > yProcessor.java:130) > at > org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity > ProcessorWrapper.java:238) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j > ava:596) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j > ava:622) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j > ava:622) > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java > :268) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18 > 7) > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte > r.java:359) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java > :427) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java: > 408) > Caused by: org.apache.tika.exception.TikaException: Unexpected > RuntimeException from org.apache.tika.parser.ParserDecorator$1@8a799a > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137) > at > org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit > yProcessor.java:128) > ... 9 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 29 > at > org.apache.poi.hwpf.model.StyleSheet.getCharacterStyle(StyleSheet.java:3 > 15) > at > org.apache.poi.hwpf.model.CHPX.getCharacterProperties(CHPX.java:60) > at > org.apache.poi.hwpf.usermodel.CharacterRun.<init>(CharacterRun.java:98) > at > org.apache.poi.hwpf.usermodel.Range.getCharacterRun(Range.java:797) > at > org.apache.poi.hwpf.model.PicturesTable.getAllPictures(PicturesTable.jav > a:191) > at > org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(Wor > dExtractor.java:429) > at > org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(Wor > dExtractor.java:419) > at > org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java: > 75) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:18 > 7) > at > org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) > ... 11 more > > Nov 12, 2011 10:22:16 AM org.apache.solr.update.DirectUpdateHandler2 > rollback > INFO: start rollback > Nov 12, 2011 10:22:16 AM org.apache.solr.update.DirectUpdateHandler2 > rollback > INFO: end_rollback > -- > Sincerely, > David Webb > > > > -----Original Message----- > From: David T. Webb [mailto:david.w...@brightmove.com] > Sent: Saturday, November 12, 2011 10:08 AM > To: solr-user@lucene.apache.org > Subject: TikaEntityProcesor Exception Handling > > When indexing over 2MM documents with Solr and the TikaEntityProcessor, > the indexing fails if Tika encounters an exception with one of the > documents. How can I tell Solr to keep going and just ignore the failed > documents from the Tika Processor? > > > > Thanks. > > > > -- > > Sincerely, > > David Webb > - Mark Miller lucidimagination.com