some document have content can not be extracted and stack in JVM of solr ; i get this ERROR:
24/03/2016 à 19:26:59 ERROR null DocBuilder Exception while processing: files document : null:org.apache.solr.handler. dataimport.DataImportHandlerException: Unable to read content Processing Document # 1 Exception while processing: files document : null:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read content Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:70) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:244) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:515) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:417) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:481) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:462) Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@2cc58e97 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:258) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:159) ... 9 more Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(Unknown Source) at org.apache.tika.parser.microsoft.WordExtractor.handleSpecialCharacterRuns(WordExtractor.java:407) at org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:256) at org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:196) at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:105) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:201) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:172) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) ... 12 more