Trying to index a document. A docx file. Ending up with the below exception. 
Not sure why it is erroring out. When I opened the docx I was able to see lots 
of binary data like embedded pictures etc., Is there a possible solution to 
this or is it a bug? Only one such file fails. Rest of the files are smoothly 
indexed.

2015-11-04 23:16:11.549 INFO  (coreLoadExecutor-6-thread-1) [   x:tika] 
o.a.s.c.CoreContainer registering core: tika
2015-11-04 23:16:11.549 INFO  (searcherExecutor-7-thread-1-processing-x:tika) [ 
  x:tika] o.a.s.c.SolrCore QuerySenderListener sending requests to 
Searcher@1eb69b2[tika] 
main{ExitableDirectoryReader(UninvertingDirectoryReader())}
2015-11-04 23:16:11.585 INFO  (searcherExecutor-7-thread-1-processing-x:tika) [ 
  x:tika] o.a.s.c.S.Request [tika] webapp=null path=null 
params={q=static+firstSearcher+warming+in+solrconfig.xml&distrib=false&event=firstSearcher}
 hits=0 status=0 QTime=34
2015-11-04 23:16:11.586 INFO  (searcherExecutor-7-thread-1-processing-x:tika) [ 
  x:tika] o.a.s.c.SolrCore QuerySenderListener done.
2015-11-04 23:16:11.586 INFO  (searcherExecutor-7-thread-1-processing-x:tika) [ 
  x:tika] o.a.s.h.c.SpellCheckComponent Loading spell index for spellchecker: 
default
2015-11-04 23:16:11.586 INFO  (searcherExecutor-7-thread-1-processing-x:tika) [ 
  x:tika] o.a.s.h.c.SpellCheckComponent Loading spell index for spellchecker: 
wordbreak
2015-11-04 23:16:11.586 INFO  (searcherExecutor-7-thread-1-processing-x:tika) [ 
  x:tika] o.a.s.h.c.SuggestComponent buildOnStartup: mySuggester
2015-11-04 23:16:11.586 INFO  (searcherExecutor-7-thread-1-processing-x:tika) [ 
  x:tika] o.a.s.s.s.SolrSuggester SolrSuggester.build(mySuggester)
2015-11-04 23:16:11.605 INFO  (searcherExecutor-7-thread-1-processing-x:tika) [ 
  x:tika] o.a.s.c.SolrCore [tika] Registered new searcher 
Searcher@1eb69b2[tika] 
main{ExitableDirectoryReader(UninvertingDirectoryReader())}
2015-11-04 23:16:25.923 INFO  (qtp7980742-16) [   x:tika] 
o.a.s.h.d.DataImporter Loading DIH Configuration: tika-data-config.xml
2015-11-04 23:16:25.937 INFO  (qtp7980742-16) [   x:tika] 
o.a.s.h.d.DataImporter Data Configuration loaded successfully
2015-11-04 23:16:25.947 INFO  (qtp7980742-16) [   x:tika] o.a.s.c.S.Request 
[tika] webapp=/solr path=/dataimport 
params={debug=false&optimize=false&indent=true&commit=true&clean=true&wt=json&command=full-import&verbose=false}
 status=0 QTime=28
2015-11-04 23:16:25.948 INFO  (Thread-17) [   x:tika] o.a.s.h.d.DataImporter 
Starting Full Import
2015-11-04 23:16:25.961 INFO  (Thread-17) [   x:tika] 
o.a.s.h.d.SimplePropertiesWriter Read dataimport.properties
2015-11-04 23:16:25.966 INFO  (qtp7980742-14) [   x:tika] o.a.s.c.S.Request 
[tika] webapp=/solr path=/dataimport 
params={indent=true&wt=json&command=status&_=1446678985952} status=0 QTime=1
2015-11-04 23:16:25.998 INFO  (Thread-17) [   x:tika] o.a.s.c.SolrCore [tika] 
REMOVING ALL DOCUMENTS FROM INDEX
2015-11-04 23:16:26.728 ERROR (Thread-17) [   x:tika] 
o.a.s.h.d.EntityProcessorWrapper Exception in entity : 
documentImport:org.apache.solr.handler.dataimport.DataImportHandlerException: 
Unable to read content Processing Document # 1

      at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:70)

      at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:168)

      at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)

      at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)

      at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:514)

      at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)

      at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)

      at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)

      at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)

      at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)

      at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)

Caused by: org.apache.tika.exception.TikaException: TIKA-198: Illegal 
IOException from 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser@1b3e0a6<mailto:org.apache.tika.parser.microsoft.ooxml.OOXMLParser@1b3e0a6>

      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:262)

      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)

      at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)

      at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:162)

      ... 9 more

Caused by: java.io.CharConversionException: Characters larger than 4 bytes are 
not supported: byte 0xb7 implies a length of more than 4 bytes

      at 
org.apache.xmlbeans.impl.piccolo.xml.UTF8XMLDecoder.decode(UTF8XMLDecoder.java:162)

      at 
org.apache.xmlbeans.impl.piccolo.xml.XMLStreamReader$FastStreamDecoder.read(XMLStreamReader.java:762)

      at 
org.apache.xmlbeans.impl.piccolo.xml.XMLStreamReader.read(XMLStreamReader.java:162)

      at 
org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yy_refill(PiccoloLexer.java:3477)

      at 
org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yylex(PiccoloLexer.java:3962)

      at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yylex(Piccolo.java:1290)

      at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yyparse(Piccolo.java:1400)

      at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:714)

      at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3479)

      at 
org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1277)

      at 
org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1264)

      at 
org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345)

      at 
org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse(Unknown
 Source)

      at 
org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:136)

      at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:166)

      at 
org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:118)

      at 
org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:59)

      at 
org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:181)

      at 
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86)

      at 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)

      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)

      ... 12 more


2015-11-04 23:16:26.729 INFO  (Thread-17) [   x:tika] o.a.s.h.d.DocBuilder 
Import completed successfully

Reply via email to