I have a valid xml document that begins:
<add><doc><field name="id">mdp.39015052775379</field>
<field name="rights">2</field>
<field name="title">Technology transfer and in-house R&D in Indian
industry : in the later 1990s / edited and with an introduction by Binay
Kumar Pattnaik. v.1</field>
<field name="author">Not found</field>
<field name="ocr"> TECHNOLOGY
TRANSFER AND
IN.HOUSE R&D
IN
INDIAN
INDUSTRY
I believe Solr is throwing an exception when it sees the line:
IN.HOUSE R&D
The error message is:
SEVERE: [com.ctc.wstx.exc.WstxLazyException]
com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character ' '
(code 32); expected
a semi-colon after the reference for entity 'D'
This seems wrong. It is as though the parser has converted &D to &D
and then complains about a missing semi-colon.
Can anyone make sense of this?
Full traceback follows.
Thanks!!
Phil
----
Solr Specification Version: 1.3.0.2008.12.04.08.06.02
Solr Implementation Version: nightly exported - yonik - 2008-12-04 08:06:02
Lucene Specification Version: 2.9-dev
Lucene Implementation Version: 2.9-dev 719313 - 2008-11-20 23:51:24
Current Time: Tue Aug 25 17:51:57 EDT 2009
Server Start Time:Tue Aug 25 17:10:44 EDT 2009
Aug 25, 2009 12:42:16 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/mbooks-ls-shard-2 path=/update params={} status=500
QTime=4
Aug 25, 2009 12:42:16 PM org.apache.solr.common.SolrException log
SEVERE: [com.ctc.wstx.exc.WstxLazyException]
com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character ' '
(code 32); expected
a semi-colon after the reference for entity 'D'
at [row,col {unknown-source}]: [4,57]
at
com.ctc.wstx.exc.WstxLazyException.throwLazily(WstxLazyException.java:45)
at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:729)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3659)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:276)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1313)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:548)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:874)
at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
at java.lang.Thread.run(Thread.java:619)
Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected
character ' ' (code 32); expected a semi-colon after the reference
for entity 'D'
at [row,col {unknown-source}]: [4,57]
at
com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)
at
com.ctc.wstx.sr.StreamScanner.parseEntityName(StreamScanner.java:1994)
at
com.ctc.wstx.sr.StreamScanner.fullyResolveEntity(StreamScanner.java:1496)
at
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4681)
at
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
at
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
... 24 more