My guess is it has to do with switching the StAX implementation to
geronimo API and the woodstox implementation
https://issues.apache.org/jira/browse/SOLR-770
I'm not sure what the solution is though...
On Sep 17, 2008, at 10:02 PM, Joshua Reedy wrote:
I have been using a stable dev version of 1.3 for a few months.
Today, I began testing the final release version, and I encountered a
strange problem.
The only thing that has changed in my setup is the solr code (I didn't
make any config change or change the schema).
a document has a text field with a value that contains:
"Andr\005é 3000"
Indexing the document by itself or as part of a batch, produces the
following error:
Sep 17, 2008 5:00:27 PM org.apache.solr.common.SolrException log
SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
character ((CTRL-CHAR, code 5))
at [row,col {unknown-source}]: [5,205]
at
com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:
675)
at
com
.ctc
.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:
4668)
at
com
.ctc
.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:
4126)
at
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:
3701)
at
com
.ctc
.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:
3649)
at
com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at
org
.apache
.solr
.handler
.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:327)
at
org
.apache
.solr
.handler
.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:
195)
at
org
.apache
.solr
.handler
.XmlUpdateRequestHandler
.handleRequestBody(XmlUpdateRequestHandler.java:123)
at
org
.apache
.solr
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at
org
.apache
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at
org
.apache
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at
org
.apache
.catalina
.core
.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:
235)
at
org
.apache
.catalina
.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org
.apache
.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:
233)
at
org
.apache
.catalina.core.StandardContextValve.invoke(StandardContextValve.java:
175)
at
org
.apache
.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org
.apache
.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org
.apache
.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:
109)
at
org
.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:
286)
at
org
.apache.coyote.http11.Http11Processor.process(Http11Processor.java:
844)
at org.apache.coyote.http11.Http11Protocol
$Http11ConnectionHandler.process(Http11Protocol.java:583)
at org.apache.tomcat.util.net.JIoEndpoint
$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:595)
The latest version of the solr doesn't seem to like control characters
(\005, in this case), but previous versions handled them (or at least
ignored them).
These characters shouldn't be in my documents, so there's a bug on my
end to track down. However, I'm wondering if this was an expected
change or an unintended consequence of recent work . . .
--
-------------------------------------------------------------------------------------------------
Be who you are and say what you feel,
because those who mind don't matter and
those who matter don't mind.
-- Dr. Seuss