yes i'am indexing succeflly with DIH other files ; now i try to index this files with ExtractingRequestHandler i get this ERROR:
null:org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Error creating OOXML extractor at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.tika.exception.TikaException: Error creating OOXML extractor at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:122) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221) ... 27 more Caused by: org.apache.poi.openxml4j.exceptions.InvalidFormatException: Package should contain a content type part [M1.13] at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:203) at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:673) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:274) at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:73) 2016-01-12 1:23 GMT+00:00 Erick Erickson <erickerick...@gmail.com>: > Looks like a bad file. Do you have any success using DIH on any files? > > What happens if you just send that particular file throug the > ExtractingRequestHandler? > > Best, > Erick > > On Mon, Jan 11, 2016 at 3:51 PM, kostali hassan > <med.has.kost...@gmail.com> wrote: > > such files msword and pdf donsnt indexing using *dataimoprt i have this > > error:* > > > > Full Import failed:java.lang.RuntimeException: > > java.lang.RuntimeException: > > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable > > to read content Processing Document # 2 > > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270) > > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416) > > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480) > > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461) > > Caused by: java.lang.RuntimeException: > > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable > > to read content Processing Document # 2 > > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416) > > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329) > > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232) > > ... 3 more > > Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: > > Unable to read content Processing Document # 2 > > at > org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:70) > > at > org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:168) > > at > org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) > > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475) > > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:514) > > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414) > > ... 5 more > > Caused by: org.apache.tika.exception.TikaException: Unexpected > > RuntimeException from > > org.apache.tika.parser.microsoft.ooxml.OOXMLParser@188120 > > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:258) > > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) > > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > > at > org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:162) > > ... 9 more > > Caused by: org.apache.poi.openxml4j.exceptions.InvalidOperationException: > > Can't open the specified file: > > 'D:\solr\solr-5.3.1\server\tmp\apache-tika-121920532070319073.tmp' > > at > org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:112) > > at > org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:224) > > at > org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:69) > > at > org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82) > > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) > > ... 12 more > > Caused by: java.util.zip.ZipException: invalid END header (bad central > > directory offset) > > at java.util.zip.ZipFile.open(Native Method) > > at java.util.zip.ZipFile.<init>(ZipFile.java:220) > > at java.util.zip.ZipFile.<init>(ZipFile.java:150) > > at java.util.zip.ZipFile.<init>(ZipFile.java:164) > > at > org.apache.poi.openxml4j.opc.internal.ZipHelper.openZipFile(ZipHelper.java:174) > > at > org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:110) > > ... 16 more >