I tried to import some documents into SOLR Cloud using Apache Manifold.

TIKA started throwing exceptions for various documents

The exception reads like the following:

org.apache.solr.common.SolrException
at org.apache.solr.handler.extraction.ExtractionDocumentLoader.load(
ExtractingDocumentLoader.java: 213)
..........

Caused by:  org.apache.tika.exception.TikaException:
UnexpectedRuntimeException from
org.apche.tika.parser.microsoft.OfficeParser@d394424
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
...........
Caused by: java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(NativeMethod)
at
org.apache.poi.hwpf.usermodel.Picture.fillRawImageContent(Picture.java:363)

It seems to be related to the following fix now in Tika 1.1

https://issues.apache.org/bugzilla/show_bug.cgi?id=51902

Can the Tika libraries in the SOLR trunk be updated?

------------------------------
This e-mail and any files transmitted with it may be proprietary.  Please note 
that any views or opinions presented in this e-mail are solely those of the 
author and do not necessarily represent those of Apogee Integration.

Reply via email to