You *probaby* can update the Tika libraries in Solr, but it'll be "interesting" to get all the right ones updated, there are a bunch of them in Tika. And I make no guarantees.
If it proves difficult, it's not too hard to write a SolrJ program that does the Tika extraction and run it on a client totally separated from the Solr server. Best Erick On Sun, Feb 26, 2012 at 7:33 PM, Matthew Parker <mpar...@apogeeintegration.com> wrote: > I tried to import some documents into SOLR Cloud using Apache Manifold. > > TIKA started throwing exceptions for various documents > > The exception reads like the following: > > org.apache.solr.common.SolrException > at org.apache.solr.handler.extraction.ExtractionDocumentLoader.load( > ExtractingDocumentLoader.java: 213) > .......... > > Caused by: org.apache.tika.exception.TikaException: > UnexpectedRuntimeException from > org.apche.tika.parser.microsoft.OfficeParser@d394424 > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) > ........... > Caused by: java.lang.ArrayIndexOutOfBoundsException > at java.lang.System.arraycopy(NativeMethod) > at > org.apache.poi.hwpf.usermodel.Picture.fillRawImageContent(Picture.java:363) > > It seems to be related to the following fix now in Tika 1.1 > > https://issues.apache.org/bugzilla/show_bug.cgi?id=51902 > > Can the Tika libraries in the SOLR trunk be updated? > > ------------------------------ > This e-mail and any files transmitted with it may be proprietary. Please > note that any views or opinions presented in this e-mail are solely those of > the author and do not necessarily represent those of Apogee Integration.