This is an issue with "extractOnly=true" on Solr 3.6.1. We upgraded to 4.0 Beta 2 and the problem went away. Just in case anyone runs into this.
Sincerely, Alex -----Original Message----- From: Alexander Cougarman [mailto:acoug...@bwc.org] Sent: 23 August 2012 12:27 PM To: solr-user@lucene.apache.org Subject: Can't extract Outlook message files Hi. We're trying to use the following Curl command to perform an "extract only" of *.MSG file, but it blows up: curl "http://localhost:8983/solr/update/extract?extractOnly=true" -F "myfile=@900002.msg" If we do this, it works fine: curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true" -F "myfile=@900002.msg" We've tried a variety of MSG files and they all produce the same error; they all have content in them. What are we doing wrong? Here's the exception the extractOnly=true command generates: <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/> <title>Error 500 null org.apache.solr.common.SolrException at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr actingDocumentLoader.java:233) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co ntentStreamHandlerBase.java:58) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl erBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle Request(RequestHandlers.java:244) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter .java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte r.java:260) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet Handler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3 99) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav a:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1 82) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7 66) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand lerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection. java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1 52) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:54 2) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnectio n.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector. java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.j ava:582) Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@aaf063 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244 ) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242 ) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1 20) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr actingDocumentLoader.java:227) ... 23 more Caused by: java.lang.IllegalStateException: Internal: Internal error: element st ate is zero. at org.apache.xml.serialize.BaseMarkupSerializer.leaveElementState(Unkno wn Source) at org.apache.xml.serialize.XMLSerializer.endElementIO(Unknown Source) at org.apache.xml.serialize.XMLSerializer.endElement(Unknown Source) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler Decorator.java:136) at org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHand ler.java:256) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler Decorator.java:136) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler Decorator.java:136) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler Decorator.java:136) at org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler. java:273) at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandl er.java:213) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java :178) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242 ) ... 26 more </title> </head> <body><h2>HTTP ERROR 500</h2> <p>Problem accessing /solr/update/extract. Reason: <pre> null org.apache.solr.common.SolrException at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr actingDocumentLoader.java:233) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co ntentStreamHandlerBase.java:58) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl erBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle Request(RequestHandlers.java:244) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter .java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte r.java:260) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet Handler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3 99) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav a:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1 82) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7 66) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand lerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection. java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1 52) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:54 2) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnectio n.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector. java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.j ava:582) Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@aaf063 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244 ) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242 ) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1 20) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr actingDocumentLoader.java:227) ... 23 more Caused by: java.lang.IllegalStateException: Internal: Internal error: element st ate is zero. at org.apache.xml.serialize.BaseMarkupSerializer.leaveElementState(Unkno wn Source) at org.apache.xml.serialize.XMLSerializer.endElementIO(Unknown Source) at org.apache.xml.serialize.XMLSerializer.endElement(Unknown Source) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler Decorator.java:136) at org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHand ler.java:256) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler Decorator.java:136) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler Decorator.java:136) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler Decorator.java:136) at org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler. java:273) at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandl er.java:213) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java :178) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242 ) ... 26 more </pre></p><hr /><i><small>Powered by Jetty://</small></i><br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> </body> </html> Sincerely, Alex