Then you probably have a corrupt file or have discovered a Tika bug. Next I'd try running the file through stand-alone Tika, perhaps trying different versions of Tika. If this latter is the case, you can always use a more recent version of Tika with Solr and/or process the file on a SolrJ client (which I recommend anyway for high-volume systems).
See: https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/ Best, Erick On Tue, Jan 12, 2016 at 2:03 AM, kostali hassan <med.has.kost...@gmail.com> wrote: > yes i'am indexing succeflly with DIH other files ; now i try to index this > files with ExtractingRequestHandler i get this ERROR: > > null:org.apache.solr.common.SolrException: > org.apache.tika.exception.TikaException: Error creating OOXML > extractor > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.eclipse.jetty.server.Server.handle(Server.java:499) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) > at > org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.tika.exception.TikaException: Error creating > OOXML extractor > at > org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:122) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221) > ... 27 more > Caused by: org.apache.poi.openxml4j.exceptions.InvalidFormatException: > Package should contain a content type part [M1.13] > at > org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:203) > at > org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:673) > at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:274) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:73) > > > 2016-01-12 1:23 GMT+00:00 Erick Erickson <erickerick...@gmail.com>: > >> Looks like a bad file. Do you have any success using DIH on any files? >> >> What happens if you just send that particular file throug the >> ExtractingRequestHandler? >> >> Best, >> Erick >> >> On Mon, Jan 11, 2016 at 3:51 PM, kostali hassan >> <med.has.kost...@gmail.com> wrote: >> > such files msword and pdf donsnt indexing using *dataimoprt i have this >> > error:* >> > >> > Full Import failed:java.lang.RuntimeException: >> > java.lang.RuntimeException: >> > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable >> > to read content Processing Document # 2 >> > at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270) >> > at >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416) >> > at >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480) >> > at >> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461) >> > Caused by: java.lang.RuntimeException: >> > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable >> > to read content Processing Document # 2 >> > at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416) >> > at >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329) >> > at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232) >> > ... 3 more >> > Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: >> > Unable to read content Processing Document # 2 >> > at >> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:70) >> > at >> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:168) >> > at >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) >> > at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475) >> > at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:514) >> > at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414) >> > ... 5 more >> > Caused by: org.apache.tika.exception.TikaException: Unexpected >> > RuntimeException from >> > org.apache.tika.parser.microsoft.ooxml.OOXMLParser@188120 >> > at >> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:258) >> > at >> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) >> > at >> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) >> > at >> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:162) >> > ... 9 more >> > Caused by: org.apache.poi.openxml4j.exceptions.InvalidOperationException: >> > Can't open the specified file: >> > 'D:\solr\solr-5.3.1\server\tmp\apache-tika-121920532070319073.tmp' >> > at >> org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:112) >> > at >> org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:224) >> > at >> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:69) >> > at >> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82) >> > at >> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) >> > ... 12 more >> > Caused by: java.util.zip.ZipException: invalid END header (bad central >> > directory offset) >> > at java.util.zip.ZipFile.open(Native Method) >> > at java.util.zip.ZipFile.<init>(ZipFile.java:220) >> > at java.util.zip.ZipFile.<init>(ZipFile.java:150) >> > at java.util.zip.ZipFile.<init>(ZipFile.java:164) >> > at >> org.apache.poi.openxml4j.opc.internal.ZipHelper.openZipFile(ZipHelper.java:174) >> > at >> org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:110) >> > ... 16 more >>