Sounds like this POI bug (SolrCell invokes Tika which invokes POI):
https://issues.apache.org/bugzilla/show_bug.cgi?id=53380

Are these in fact Office 97 documents that are failing?

Solr 4.0 includes Tika 1.1, while Solr 3.6.1 includes Tika 1.0.

It may be possible for you to drop the old Tika 1.0 into Solr 4.0, but I wouldn't try to guarantee that.

In any case, this should be filed in Jira as a bug in Solr 4.0-BETA (SolrCell/Extraction component).

-- Jack Krupansky

-----Original Message----- From: Alexander Cougarman
Sent: Wednesday, August 29, 2012 9:05 AM
To: solr-user@lucene.apache.org
Subject: Unexcpected RuntimeException when indexing with Solr 4.0 Beta

Hi. I'm using Solr 4.0 Beta (no modifications to default installation) to index, and it's blowing up on some Word docs:

curl "http://localhost:8983/solr/update/extract?literal.id=doc15&commit=true"; -F "myfile=@15.doc"

Here's the exception. And the same files go through Solr 3.6.1 just fine.

   <?xml version="1.0" encoding="UTF-8"?>
   <response>
<lst name="responseHeader"><int name="status">500</int><int name="QTime">18</int ></lst><lst name="error"><str name="msg">org.apache.tika.exception.TikaException : Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser @328c62ce</str><str name="trace">org.apache.solr.common.SolrException: org.apach e.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika
   .parser.microsoft.OfficeParser@328c62ce
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
   actingDocumentLoader.java:230)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
   ntentStreamHandlerBase.java:74)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
   erBase.java:129)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle
   Request(RequestHandlers.java:240)
           at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
   .java:454)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
   r.java:275)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
   Handler.java:1337)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java
   :484)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
   ava:119)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandl
   er.java:233)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandl
   er.java:1065)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:
   413)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandle
   r.java:192)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandle
   r.java:999)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
   ava:117)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Cont
   extHandlerCollection.java:250)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerColl
   ection.java:149)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper
   .java:111)
           at org.eclipse.jetty.server.Server.handle(Server.java:351)
at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(Abstrac
   tHttpConnection.java:454)
at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(Blockin
   gHttpConnection.java:47)
at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(Abstra
   ctHttpConnection.java:890)
at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.header
   Complete(AbstractHttpConnection.java:944)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:642) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)

at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpCo
   nnection.java:66)
at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(So
   cketConnector.java:254)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPoo
   l.java:599)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool
   .java:534)
           at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException
   from org.apache.tika.parser.microsoft.OfficeParser@328c62ce
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244
   )
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
   )
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
   20)
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
   actingDocumentLoader.java:224)
           ... 31 more
   Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
at org.apache.poi.util.LittleEndian.getInt(LittleEndian.java:163) at org.apache.poi.hwpf.model.Colorref.&lt;init&gt;(Colorref.java:81) at org.apache.poi.hwpf.model.types.SHDAbstractType.fillFields(SHDAbstrac
   tType.java:56)
at org.apache.poi.hwpf.usermodel.ShadingDescriptor.&lt;init&gt;(ShadingD
   escriptor.java:38)
at org.apache.poi.hwpf.sprm.CharacterSprmUncompressor.unCompressCHPOpera
   tion(CharacterSprmUncompressor.java:582)
at org.apache.poi.hwpf.sprm.CharacterSprmUncompressor.uncompressCHP(Char
   acterSprmUncompressor.java:65)
at org.apache.poi.hwpf.model.StyleSheet.createChp(StyleSheet.java:288) at org.apache.poi.hwpf.model.StyleSheet.&lt;init&gt;(StyleSheet.java:121
   )
at org.apache.poi.hwpf.HWPFDocument.&lt;init&gt;(HWPFDocument.java:346) at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.ja
   va:77)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java
   :185)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java
   :160)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
   )
           ... 34 more
   </str><int name="code">500</int></lst>
   </response>

Sincerely,
Alex

Reply via email to