Sounds like this POI bug (SolrCell invokes Tika which invokes POI):
https://issues.apache.org/bugzilla/show_bug.cgi?id=53380
Are these in fact Office 97 documents that are failing?
Solr 4.0 includes Tika 1.1, while Solr 3.6.1 includes Tika 1.0.
It may be possible for you to drop the old Tika 1.0 into Solr 4.0, but I
wouldn't try to guarantee that.
In any case, this should be filed in Jira as a bug in Solr 4.0-BETA
(SolrCell/Extraction component).
-- Jack Krupansky
-----Original Message-----
From: Alexander Cougarman
Sent: Wednesday, August 29, 2012 9:05 AM
To: solr-user@lucene.apache.org
Subject: Unexcpected RuntimeException when indexing with Solr 4.0 Beta
Hi. I'm using Solr 4.0 Beta (no modifications to default installation) to
index, and it's blowing up on some Word docs:
curl
"http://localhost:8983/solr/update/extract?literal.id=doc15&commit=true" -F
"myfile=@15.doc"
Here's the exception. And the same files go through Solr 3.6.1 just fine.
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">500</int><int
name="QTime">18</int
></lst><lst name="error"><str
name="msg">org.apache.tika.exception.TikaException
: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser
@328c62ce</str><str name="trace">org.apache.solr.common.SolrException:
org.apach
e.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika
.parser.microsoft.OfficeParser@328c62ce
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:230)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
ntentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
erBase.java:129)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle
Request(RequestHandlers.java:240)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
.java:454)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
r.java:275)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
Handler.java:1337)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java
:484)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
ava:119)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandl
er.java:233)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandl
er.java:1065)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:
413)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandle
r.java:192)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandle
r.java:999)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
ava:117)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Cont
extHandlerCollection.java:250)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerColl
ection.java:149)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper
.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:351)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(Abstrac
tHttpConnection.java:454)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(Blockin
gHttpConnection.java:47)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(Abstra
ctHttpConnection.java:890)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.header
Complete(AbstractHttpConnection.java:944)
at
org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:642)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpCo
nnection.java:66)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(So
cketConnector.java:254)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPoo
l.java:599)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool
.java:534)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.tika.exception.TikaException: Unexpected
RuntimeException
from org.apache.tika.parser.microsoft.OfficeParser@328c62ce
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244
)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
20)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:224)
... 31 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
at
org.apache.poi.util.LittleEndian.getInt(LittleEndian.java:163)
at
org.apache.poi.hwpf.model.Colorref.<init>(Colorref.java:81)
at
org.apache.poi.hwpf.model.types.SHDAbstractType.fillFields(SHDAbstrac
tType.java:56)
at
org.apache.poi.hwpf.usermodel.ShadingDescriptor.<init>(ShadingD
escriptor.java:38)
at
org.apache.poi.hwpf.sprm.CharacterSprmUncompressor.unCompressCHPOpera
tion(CharacterSprmUncompressor.java:582)
at
org.apache.poi.hwpf.sprm.CharacterSprmUncompressor.uncompressCHP(Char
acterSprmUncompressor.java:65)
at
org.apache.poi.hwpf.model.StyleSheet.createChp(StyleSheet.java:288)
at
org.apache.poi.hwpf.model.StyleSheet.<init>(StyleSheet.java:121
)
at
org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:346)
at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.ja
va:77)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java
:185)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java
:160)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
)
... 34 more
</str><int name="code">500</int></lst>
</response>
Sincerely,
Alex