HI ,
Im Using solr 5.4.1 for indexing thousands of documents, and it works
perfectly.The issue comes when some documents are not well formatted or
contains some special characters and it makes solr hangs or blocked on some
perticular documents and it gives these errors when viewing the log :
i want to detect what files are causing these problems, or at least point
me to some library Im missing. Thanks in advance

Exception while processing: files document :
null:org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to read content Processing Document # 1
    at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:70)
    at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165)
    at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:244)
    at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
    at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:515)
    at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
    at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
    at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
    at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:417)
    at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:481)
    at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:462)
Caused by: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@2cc58e97
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:258)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:159)
    ... 9 more
Caused by: java.lang.StringIndexOutOfBoundsException: String index out
of range: -1
    at java.lang.String.substring(Unknown Source)
    at 
org.apache.tika.parser.microsoft.WordExtractor.handleSpecialCharacterRuns(WordExtractor.java:407)
    at 
org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:256)
    at 
org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:196)
    at 
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:105)
    at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:201)
    at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:172)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
    ... 12 more

25/03/2016 à 11:23:29 ERROR null DataImporter Full Import
failed:java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
read content Processing Document # 1

Full Import failed:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to read content Processing Document # 1
        at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271)
        at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:417)
        at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:481)
        at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:462)
Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to read content Processing Document # 1
        at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:417)
        at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
        at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
        ... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to read content Processing Document # 1
        at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:70)
        at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165)
        at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:244)
        at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
        at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:515)
        at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
        ... 5 more
Caused by: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@702c6cb8
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:258)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
        at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:159)
        ... 9 more
Caused by: java.lang.StringIndexOutOfBoundsException: String index out
of range: -1
        at java.lang.String.substring(Unknown Source)
        at 
org.apache.tika.parser.microsoft.WordExtractor.handleSpecialCharacterRuns(WordExtractor.java:407)
        at 
org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:256)
        at 
org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:196)
        at 
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:105)
        at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:201)
        at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:172)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
        ... 12 more



Cordialement

*Moncif AIDI*. Ingénieur Chef d'équipe à TeslaTeam-Maroc
<http://www.teslateam.ma/>
M:+212 658 541 045 | T:+212 537 70 81 21
Linkedin
<https://www.linkedin.com/profile/view?id=131220035&trk=nav_responsive_tab_profile>
 | Facebook <https://www.facebook.com/M0ziNsof> | Twitter
<http://twitter.com/teslateam> | *Skype :* moncif44

Reply via email to