This is a Tika/POI problem. Please download tika-app 1.14 [1] or a nightly version of Tika [2] and run
java -jar tika-app.jar <your_file.vsdx> If the problem is fixed, we'll try to upgrade dependencies in Solr. If it isn't fixed, please open a bug on Tika's Jira. If this is a missing bean issue (sorry, I can't tell from your stacktrace which class is missing), as a temporary workaround, you can rm "poi-ooxml-schemas" and add the full "ooxml-schemas", and you should be good to go. [3] Cheers, Tim [1] http://www.apache.org/dyn/closer.cgi/tika/tika-app-1.14.jar [2] https://builds.apache.org/job/Tika-trunk/1193/org.apache.tika$tika-app/artifact/org.apache.tika/tika-app/1.15-20170202.203920-124/tika-app-1.15-20170202.203920-124.jar [3] http://poi.apache.org/faq.html#faq-N10025 -----Original Message----- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Friday, February 3, 2017 9:49 AM To: solr-user <solr-user@lucene.apache.org> Subject: Re: Solr 6.4. Can't index MS Visio vsdx files This kind of information extraction comes from Apache Tika that is shipped with Solr. However Solr does not ship every possible parser with its installation. So, I think you are hitting Tika where it manages to figure out what type of content you have, but does not have (Apache POI - another O/S project) library installed. What you need to do is to get the additional jar from Tika/POI's project/download and make it visible to Solr (probably as an extension jar in a lib folder somewhere - I am a bit hazy on that for latest Solr). The version of Tika that Solr uses is part of the changes notes. For 6.4, it is https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.4.0/solr/CHANGES.txt and it is Tika 1.13 Hope it helps, Alex. ---- http://www.solr-start.com/ - Resources for Solr users, new and experienced On 3 February 2017 at 05:57, Gytis Mikuciunas <gyt...@gmail.com> wrote: > Hi, > > > I'm using single core Solr 6.4 instance on windows server (windows > server > 2012 R2 standard), > Java v8, (build 1.8.0_121-b13). > > All works more or less ok, except MS Visio vsdx files indexing. > > > Every time it throws an error (no matters if it tries to index vsdx > file or for example docx with visio diagram inside). > > Thx in advance for your help. If you need some additional info, please ask. > > > Error/Exception from log: > > > Null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: > Could not initialize class > org.apache.poi.xdgf.usermodel.section.geometry.GeometryRowFactory > at > org.apache.poi.xdgf.usermodel.section.GeometrySection.<init>(GeometrySection.java:55) > at > org.apache.poi.xdgf.usermodel.XDGFSheet.<init>(XDGFSheet.java:77) > at > org.apache.poi.xdgf.usermodel.XDGFShape.<init>(XDGFShape.java:113) > at > org.apache.poi.xdgf.usermodel.XDGFShape.<init>(XDGFShape.java:107) > at > org.apache.poi.xdgf.usermodel.XDGFBaseContents.onDocumentRead(XDGFBaseContents.java:82) > at > org.apache.poi.xdgf.usermodel.XDGFMasterContents.onDocumentRead(XDGFMasterContents.java:66) > at > org.apache.poi.xdgf.usermodel.XDGFMasters.onDocumentRead(XDGFMasters.java:101) > at > org.apache.poi.xdgf.usermodel.XmlVisioDocument.onDocumentRead(XmlVisioDocument.java:106) > at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:160) > at > org.apache.poi.xdgf.usermodel.XmlVisioDocument.<init>(XmlVisioDocument.java:79) > at > org.apache.poi.xdgf.extractor.XDGFVisioExtractor.<init>(XDGFVisioExtractor.java:41) > at > org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:212) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72) > at > org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102) > at > org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:298) > at > org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:199) > at > org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:112) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306) > at > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:513) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.eclipse.jetty.server.Server.handle(Server.java:534) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) > at > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Unknown Source) > > > > Regards, > Gytis