Hi Jack,
we have analyzed the issue and there were duplicated jar into the tomcat
classpath for Tika. After the removal of the dulicated library now the
search engine works as expected.
Thanks for the support,
Marcello
On 11/14/2013 05:24 PM, Jack Krupansky wrote:
The actual error appears to be:
Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber:
105; The element type "img" must be terminated by the matching end-tag
"</img>".
So, check the input document at line 91, column 105. There should be
an <img> tag there, but SAX is complaining that there is no matching
</img>.
-- Jack Krupansky
-----Original Message----- From: Marcello Lorenzi
Sent: Thursday, November 14, 2013 9:26 AM
To: solr-user@lucene.apache.org
Subject: Solr xml img parsing exception
Hi,
I have installed a Solr 4.3 instance and we have configured manifoldcf
to pass web content to the shard collection, but during the crawling we
have noticed a lot of this exception:
ERROR - 2013-11-14 15:13:57.954; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: XML parse error
at
com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(CwsExtractingDocumentLoader.java:150)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:221)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:107)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:76)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:934)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:90)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:515)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1012)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:642)
at
org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:223)
at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1597)
at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1555)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.tika.exception.TikaException: XML parse error
at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at
com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(CwsExtractingDocumentLoader.java:147)
... 24 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber:
105; The element type "img" must be terminated by the matching end-tag
"</img>".
at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198)
at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441)
at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368)
at
com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1388)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1753)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2951)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
at
com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:116)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:846)
at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:775)
at
com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123)
at
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1210)
at
com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:628)
at
com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:332)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:72)
... 28 more
Could it be not configured correctly the SOLR collection?
Thanks,
Marcello