I believe these are the older Word 97 docs (*.doc) files. The problem was that 
Solr 3.6.1 blew up on *.MSG files when doing extractOnly=true. So we upgraded 
to Solr 4.0, and now run into this; if we use Tika 1.0, I'm afraid the DOC 
files will be fixed but the MSG files will break!

Sincerely,
Alex Cougarman

Bahá'í World Centre
Haifa, Israel
Office: +972-4-835-8683 
Cell: +972-54-241-4742
acoug...@bwc.org  


-----Original Message-----
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: 29 August 2012 4:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Unexcpected RuntimeException when indexing with Solr 4.0 Beta

Sounds like this POI bug (SolrCell invokes Tika which invokes POI):
https://issues.apache.org/bugzilla/show_bug.cgi?id=53380

Are these in fact Office 97 documents that are failing?

Solr 4.0 includes Tika 1.1, while Solr 3.6.1 includes Tika 1.0.

It may be possible for you to drop the old Tika 1.0 into Solr 4.0, but I 
wouldn't try to guarantee that.

In any case, this should be filed in Jira as a bug in Solr 4.0-BETA 
(SolrCell/Extraction component).

-- Jack Krupansky

-----Original Message-----
From: Alexander Cougarman
Sent: Wednesday, August 29, 2012 9:05 AM
To: solr-user@lucene.apache.org
Subject: Unexcpected RuntimeException when indexing with Solr 4.0 Beta

Hi. I'm using Solr 4.0 Beta (no modifications to default installation) to 
index, and it's blowing up on some Word docs:

  curl
"http://localhost:8983/solr/update/extract?literal.id=doc15&commit=true"; -F 
"myfile=@15.doc"

Here's the exception. And the same files go through Solr 3.6.1 just fine.

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
    <lst name="responseHeader"><int name="status">500</int><int 
name="QTime">18</int
    ></lst><lst name="error"><str
name="msg">org.apache.tika.exception.TikaException
    : Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser
    @328c62ce</str><str name="trace">org.apache.solr.common.SolrException: 
org.apach
    e.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika
    .parser.microsoft.OfficeParser@328c62ce
            at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
    actingDocumentLoader.java:230)
            at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
    ntentStreamHandlerBase.java:74)
            at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
    erBase.java:129)
            at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle
    Request(RequestHandlers.java:240)
            at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
            at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
    .java:454)
            at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
    r.java:275)
            at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
    Handler.java:1337)
            at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java
    :484)
            at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
    ava:119)
            at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
            at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandl
    er.java:233)
            at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandl
    er.java:1065)
            at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:
    413)
            at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandle
    r.java:192)
            at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandle
    r.java:999)
            at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
    ava:117)
            at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Cont
    extHandlerCollection.java:250)
            at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerColl
    ection.java:149)
            at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper
    .java:111)
            at org.eclipse.jetty.server.Server.handle(Server.java:351)
            at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(Abstrac
    tHttpConnection.java:454)
            at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(Blockin
    gHttpConnection.java:47)
            at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(Abstra
    ctHttpConnection.java:890)
            at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.header
    Complete(AbstractHttpConnection.java:944)
            at
org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:642)
            at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)

            at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpCo
    nnection.java:66)
            at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(So
    cketConnector.java:254)
            at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPoo
    l.java:599)
            at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool
    .java:534)
            at java.lang.Thread.run(Unknown Source)
    Caused by: org.apache.tika.exception.TikaException: Unexpected 
RuntimeException
    from org.apache.tika.parser.microsoft.OfficeParser@328c62ce
            at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244
    )
            at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
    )
            at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
    20)
            at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
    actingDocumentLoader.java:224)
            ... 31 more
    Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
            at
org.apache.poi.util.LittleEndian.getInt(LittleEndian.java:163)
            at
org.apache.poi.hwpf.model.Colorref.&lt;init&gt;(Colorref.java:81)
            at
org.apache.poi.hwpf.model.types.SHDAbstractType.fillFields(SHDAbstrac
    tType.java:56)
            at
org.apache.poi.hwpf.usermodel.ShadingDescriptor.&lt;init&gt;(ShadingD
    escriptor.java:38)
            at
org.apache.poi.hwpf.sprm.CharacterSprmUncompressor.unCompressCHPOpera
    tion(CharacterSprmUncompressor.java:582)
            at
org.apache.poi.hwpf.sprm.CharacterSprmUncompressor.uncompressCHP(Char
    acterSprmUncompressor.java:65)
            at
org.apache.poi.hwpf.model.StyleSheet.createChp(StyleSheet.java:288)
            at
org.apache.poi.hwpf.model.StyleSheet.&lt;init&gt;(StyleSheet.java:121
    )
            at
org.apache.poi.hwpf.HWPFDocument.&lt;init&gt;(HWPFDocument.java:346)
            at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.ja
    va:77)
            at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java
    :185)
            at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java
    :160)
            at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
    )
            ... 34 more
    </str><int name="code">500</int></lst>
    </response>

Sincerely,
Alex 

Reply via email to