Then you probably have a corrupt file or have
discovered a Tika bug.

Next I'd try running the file through stand-alone Tika,
perhaps trying different versions of Tika. If this latter
is the case, you can always use a more recent version
of Tika with Solr and/or process the file on a SolrJ client
(which I recommend anyway for high-volume systems).

See:
https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/

Best,
Erick

On Tue, Jan 12, 2016 at 2:03 AM, kostali hassan
<med.has.kost...@gmail.com> wrote:
> yes i'am indexing succeflly with DIH other files ;  now i try to index this
> files with ExtractingRequestHandler i get this ERROR:
>
> null:org.apache.solr.common.SolrException:
> org.apache.tika.exception.TikaException: Error creating OOXML
> extractor
>         at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
>         at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>         at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
>         at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
>         at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
>         at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>         at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>         at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>         at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>         at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>         at org.eclipse.jetty.server.Server.handle(Server.java:499)
>         at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>         at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>         at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>         at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>         at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tika.exception.TikaException: Error creating
> OOXML extractor
>         at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:122)
>         at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>         at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>         at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221)
>         ... 27 more
> Caused by: org.apache.poi.openxml4j.exceptions.InvalidFormatException:
> Package should contain a content type part [M1.13]
>         at 
> org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:203)
>         at 
> org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:673)
>         at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:274)
>         at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:73)
>
>
> 2016-01-12 1:23 GMT+00:00 Erick Erickson <erickerick...@gmail.com>:
>
>> Looks like a bad file. Do you have any success using DIH on any files?
>>
>> What happens if you just send that particular file throug the
>>  ExtractingRequestHandler?
>>
>> Best,
>> Erick
>>
>> On Mon, Jan 11, 2016 at 3:51 PM, kostali hassan
>> <med.has.kost...@gmail.com> wrote:
>> > such files msword and pdf donsnt indexing using *dataimoprt i have this
>> > error:*
>> >
>> > Full Import failed:java.lang.RuntimeException:
>> > java.lang.RuntimeException:
>> > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
>> > to read content Processing Document # 2
>> >         at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
>> >         at
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
>> >         at
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
>> >         at
>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
>> > Caused by: java.lang.RuntimeException:
>> > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
>> > to read content Processing Document # 2
>> >         at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
>> >         at
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
>> >         at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
>> >         ... 3 more
>> > Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
>> > Unable to read content Processing Document # 2
>> >         at
>> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:70)
>> >         at
>> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:168)
>> >         at
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
>> >         at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
>> >         at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:514)
>> >         at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
>> >         ... 5 more
>> > Caused by: org.apache.tika.exception.TikaException: Unexpected
>> > RuntimeException from
>> > org.apache.tika.parser.microsoft.ooxml.OOXMLParser@188120
>> >         at
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:258)
>> >         at
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>> >         at
>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>> >         at
>> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:162)
>> >         ... 9 more
>> > Caused by: org.apache.poi.openxml4j.exceptions.InvalidOperationException:
>> > Can't open the specified file:
>> > 'D:\solr\solr-5.3.1\server\tmp\apache-tika-121920532070319073.tmp'
>> >         at
>> org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:112)
>> >         at
>> org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:224)
>> >         at
>> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:69)
>> >         at
>> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
>> >         at
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>> >         ... 12 more
>> > Caused by: java.util.zip.ZipException: invalid END header (bad central
>> > directory offset)
>> >         at java.util.zip.ZipFile.open(Native Method)
>> >         at java.util.zip.ZipFile.<init>(ZipFile.java:220)
>> >         at java.util.zip.ZipFile.<init>(ZipFile.java:150)
>> >         at java.util.zip.ZipFile.<init>(ZipFile.java:164)
>> >         at
>> org.apache.poi.openxml4j.opc.internal.ZipHelper.openZipFile(ZipHelper.java:174)
>> >         at
>> org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:110)
>> >         ... 16 more
>>

Reply via email to