Also there's a custom loader here that is the culprit:  
com.lsegroup.solr.handler.CwsExtractingDocumentLoader

On Nov 14, 2013, at 10:20, Erick Erickson <erickerick...@gmail.com> wrote:

> It looks like bad data. The XML you're sending to Solr looks mal-formed, so
> I
> suspect this is completely outside of Solr's purview.
> 
> Best,
> Erick
> 
> 
> On Thu, Nov 14, 2013 at 9:26 AM, Marcello Lorenzi <mlore...@sorint.it>wrote:
> 
>> Hi,
>> I have installed a Solr 4.3 instance and we have configured manifoldcf to
>> pass web content to the shard collection, but during the crawling we have
>> noticed a lot of this exception:
>> 
>> ERROR - 2013-11-14 15:13:57.954; org.apache.solr.common.SolrException;
>> org.apache.solr.common.SolrException: 
>> org.apache.tika.exception.TikaException:
>> XML parse error
>>        at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(
>> CwsExtractingDocumentLoader.java:150)
>>        at org.apache.solr.handler.ContentStreamHandlerBase.
>> handleRequestBody(ContentStreamHandlerBase.java:74)
>>        at org.apache.solr.handler.RequestHandlerBase.handleRequest(
>> RequestHandlerBase.java:135)
>>        at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.
>> handleRequest(RequestHandlers.java:242)
>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
>>        at org.apache.solr.servlet.SolrDispatchFilter.execute(
>> SolrDispatchFilter.java:656)
>>        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>> SolrDispatchFilter.java:359)
>>        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>> SolrDispatchFilter.java:155)
>>        at org.apache.catalina.core.ApplicationFilterChain.
>> internalDoFilter(ApplicationFilterChain.java:241)
>>        at org.apache.catalina.core.ApplicationFilterChain.doFilter(
>> ApplicationFilterChain.java:208)
>>        at org.apache.catalina.core.StandardWrapperValve.invoke(
>> StandardWrapperValve.java:221)
>>        at org.apache.catalina.core.StandardContextValve.invoke(
>> StandardContextValve.java:107)
>>        at org.apache.catalina.core.StandardHostValve.invoke(
>> StandardHostValve.java:155)
>>        at org.apache.catalina.valves.ErrorReportValve.invoke(
>> ErrorReportValve.java:76)
>>        at org.apache.catalina.valves.AccessLogValve.invoke(
>> AccessLogValve.java:934)
>>        at org.apache.catalina.core.StandardEngineValve.invoke(
>> StandardEngineValve.java:90)
>>        at org.apache.catalina.connector.CoyoteAdapter.service(
>> CoyoteAdapter.java:515)
>>        at org.apache.coyote.http11.AbstractHttp11Processor.process(
>> AbstractHttp11Processor.java:1012)
>>        at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.
>> process(AbstractProtocol.java:642)
>>        at org.apache.coyote.http11.Http11NioProtocol$
>> Http11ConnectionHandler.process(Http11NioProtocol.java:223)
>>        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.
>> doRun(NioEndpoint.java:1597)
>>        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.
>> run(NioEndpoint.java:1555)
>>        at java.util.concurrent.ThreadPoolExecutor.runWorker(
>> ThreadPoolExecutor.java:1145)
>>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>> ThreadPoolExecutor.java:615)
>>        at java.lang.Thread.run(Thread.java:724)
>> Caused by: org.apache.tika.exception.TikaException: XML parse error
>>        at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78)
>>        at org.apache.tika.parser.CompositeParser.parse(
>> CompositeParser.java:242)
>>        at org.apache.tika.parser.CompositeParser.parse(
>> CompositeParser.java:242)
>>        at org.apache.tika.parser.AutoDetectParser.parse(
>> AutoDetectParser.java:120)
>>        at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(
>> CwsExtractingDocumentLoader.java:147)
>>        ... 24 more
>> Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber:
>> 105; The element type "img" must be terminated by the matching end-tag
>> "</img>".
>>        at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.
>> createSAXParseException(ErrorHandlerWrapper.java:198)
>>        at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.
>> fatalError(ErrorHandlerWrapper.java:177)
>>        at com.sun.org.apache.xerces.internal.impl.
>> XMLErrorReporter.reportError(XMLErrorReporter.java:441)
>>        at com.sun.org.apache.xerces.internal.impl.
>> XMLErrorReporter.reportError(XMLErrorReporter.java:368)
>>        at com.sun.org.apache.xerces.internal.impl.XMLScanner.
>> reportFatalError(XMLScanner.java:1388)
>>        at com.sun.org.apache.xerces.internal.impl.
>> XMLDocumentFragmentScannerImpl.scanEndElement(
>> XMLDocumentFragmentScannerImpl.java:1753)
>>        at com.sun.org.apache.xerces.internal.impl.
>> XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(
>> XMLDocumentFragmentScannerImpl.java:2951)
>>        at com.sun.org.apache.xerces.internal.impl.
>> XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
>>        at com.sun.org.apache.xerces.internal.impl.
>> XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:116)
>>        at com.sun.org.apache.xerces.internal.impl.
>> XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl
>> .java:511)
>>        at com.sun.org.apache.xerces.internal.parsers.
>> XML11Configuration.parse(XML11Configuration.java:846)
>>        at com.sun.org.apache.xerces.internal.parsers.
>> XML11Configuration.parse(XML11Configuration.java:775)
>>        at com.sun.org.apache.xerces.internal.parsers.XMLParser.
>> parse(XMLParser.java:123)
>>        at com.sun.org.apache.xerces.internal.parsers.
>> AbstractSAXParser.parse(AbstractSAXParser.java:1210)
>>        at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$
>> JAXPSAXParser.parse(SAXParserImpl.java:628)
>>        at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.
>> parse(SAXParserImpl.java:332)
>>        at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
>>        at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:72)
>>        ... 28 more
>> 
>> Could it be not configured correctly the SOLR collection?
>> 
>> Thanks,
>> Marcello
>> 

Reply via email to