I tried to import some documents into SOLR Cloud using Apache Manifold. TIKA started throwing exceptions for various documents
The exception reads like the following: org.apache.solr.common.SolrException at org.apache.solr.handler.extraction.ExtractionDocumentLoader.load( ExtractingDocumentLoader.java: 213) .......... Caused by: org.apache.tika.exception.TikaException: UnexpectedRuntimeException from org.apche.tika.parser.microsoft.OfficeParser@d394424 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) ........... Caused by: java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(NativeMethod) at org.apache.poi.hwpf.usermodel.Picture.fillRawImageContent(Picture.java:363) It seems to be related to the following fix now in Tika 1.1 https://issues.apache.org/bugzilla/show_bug.cgi?id=51902 Can the Tika libraries in the SOLR trunk be updated? ------------------------------ This e-mail and any files transmitted with it may be proprietary. Please note that any views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of Apogee Integration.