Hey Andrea! thanks for answering, this is the complete stack trace is following 
below. (the other is just the same):
I'm going to try that modification of the logging level but i'm really 
considering to debug tika and try to correct it myself.
 
 

03:38:23ERRORSolrCoreorg.apache.solr.common.SolrException: 
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.microsoft.OfficeParser@386f9474
org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: 
Unexpected RuntimeException from 
org.apache.tika.parser.microsoft.OfficeParser@386f9474
 at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:225)
 at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
 at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:647)
 at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException 
from org.apache.tika.parser.microsoft.OfficeParser@386f9474
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
 at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
 at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
 ... 32 more
Caused by: java.lang.IllegalStateException: Told we're for characters 122 -> 
978, but actually covers 855 characters!
 at org.apache.poi.hwpf.model.TextPiece.<init>(TextPiece.java:73)
 at org.apache.poi.hwpf.model.TextPieceTable.<init>(TextPieceTable.java:111)
 at org.apache.poi.hwpf.model.ComplexFileTable.<init>(ComplexFileTable.java:70)
 at org.apache.poi.hwpf.HWPFOldDocument.<init>(HWPFOldDocument.java:72)
 at 
org.apache.tika.parser.microsoft.WordExtractor.parseWord6(WordExtractor.java:462)
 at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:81)
 at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:186)
 at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
 ... 35 more


>>> Andrea Gazzarini <a.gazzar...@gmail.com> 17/12/2013 16:43 >>>
Hi Augusto,
I don't believe the mailing list allows attachments. Could you please post
the complete stacktrace? In addition, set the logging level of tika classes
to FINEST in solr console, maybe can be helpful

Best,
Andrea
On 17 Dec 2013 16:30, "Augusto Camarotti" <augu...@prpb.mpf.gov.br> wrote:

>  Hi guys,
>
>    I'm having a problem with solr when trying to index some broken .doc
> files.
>    I have set up a test case using Solr to index all the files the users
> save on the shared directorys of the company that i work for and Solr is
> hanging when trying to index this file in particular(the one i'm attaching
> on this e-mail). There are some others broken .doc files that Solr index by
> the name without a problem, even logging some Tika erros during the
> process, but when it reaches this file in particular, it hangs and i have
> to cancel the upload.
>    I cannot guarantee the directorys will never hold a broken .doc file,
> or a broken file with some other extension, so i guess solr could just
> return a failing message, or something like that.
>    These are the logging messages solr is recording:
>
>
>   03:38:23 ERROR SolrCore org.apache.solr.common.SolrException:
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from
> org.apache.tika.parser.microsoft.OfficeParser@386f9474 03:38:25 ERROR
> SolrDispatchFilter null:org.apache.solr.common.SolrException:
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from
> org.apache.tika.parser.microsoft.OfficeParser@386f9474
>
> So, how do I prevent solr from hanging when trying to index broken files?
>
> Regards,
>
> Augusto Camarotti
>

Reply via email to