Hi Jack,
we have analyzed the issue and there were duplicated jar into the tomcat classpath for Tika. After the removal of the dulicated library now the search engine works as expected.

Thanks for the support,
Marcello

On 11/14/2013 05:24 PM, Jack Krupansky wrote:
The actual error appears to be:

Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber:
105; The element type "img" must be terminated by the matching end-tag
"</img>".

So, check the input document at line 91, column 105. There should be an <img> tag there, but SAX is complaining that there is no matching </img>.

-- Jack Krupansky

-----Original Message----- From: Marcello Lorenzi
Sent: Thursday, November 14, 2013 9:26 AM
To: solr-user@lucene.apache.org
Subject: Solr xml img parsing exception

Hi,
I have installed a Solr 4.3 instance and we have configured manifoldcf
to pass web content to the shard collection, but during the crawling we
have noticed a lot of this exception:

ERROR - 2013-11-14 15:13:57.954; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: XML parse error
        at
com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(CwsExtractingDocumentLoader.java:150)
        at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
        at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
        at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
        at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
        at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
        at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:221)
        at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:107)
        at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155)
        at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:76)
        at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:934)
        at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:90)
        at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:515)
        at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1012)
        at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:642)
        at
org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:223)
        at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1597)
        at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1555)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.tika.exception.TikaException: XML parse error
        at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78)
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
        at
com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(CwsExtractingDocumentLoader.java:147)
        ... 24 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber:
105; The element type "img" must be terminated by the matching end-tag
"</img>".
        at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198)
        at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
        at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441)
        at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368)
        at
com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1388)
        at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1753)
        at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2951)
        at
com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
        at
com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:116)
        at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
        at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:846)
        at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:775)
        at
com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123)
        at
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1210)
        at
com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:628)
        at
com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:332)
        at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
        at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:72)
        ... 28 more

Could it be not configured correctly the SOLR collection?

Thanks,
Marcello


Reply via email to