Only a few control characters are legal in XML. Removing everthing
but newlines, space, and tab is the right thing to do. --wunder

On 12/9/08 5:45 AM, "Peter Wolanin" <[EMAIL PROTECTED]> wrote:

> We have been having this problem also. and have resorted to just
stripping
> control characters before sending the text for
> indexing:

preg_replace('@[\x00-\x08\x0B\x0C\x0E-\x1F]@', '',
> $text);

-Peter

On Tue, Dec 9, 2008 at 7:59 AM, knietzie <[EMAIL PROTECTED]>
> wrote:
>
> hi joshua,
>
> i'm having the same problem as yours.
> just
> curious, have you found any fix for this?
>
> thnks
>
>
> Joshua Reedy
> wrote:
>>
>> I have been using a stable dev version of 1.3 for a few
> months.
>> Today, I began testing the final release version, and I encountered
> a
>> strange problem.
>> The only thing that has changed in my setup is the
> solr code (I didn't
>> make any config change or change the schema).
>>
>> a
> document has a text field with a value that contains:
>> "Andr\005é
> 3000"
>>
>> Indexing the document by itself or as part of a batch, produces
> the
>> following error:
>> Sep 17, 2008 5:00:27 PM
> org.apache.solr.common.SolrException log
>> SEVERE:
> com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
>> character
> ((CTRL-CHAR, code 5))
>>  at [row,col {unknown-source}]: [5,205]
>>
> at
>> 
> com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
>>
> at
>> 
> com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:466
> 8)
>>         at
>>
> com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:412
> 6)
>>         at
>>
> com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
>>
> at
>> 
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
>
>>         at
>>
> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
>>
> at
>> 
> org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandle
> r.java:327)
>>         at
>>
> org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequest
> Handler.java:195)
>>         at
>>
> org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateReq
> uestHandler.java:123)
>>         at
>>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.ja
> va:131)
>>         at
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
>>         at
>>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303
> )
>>         at
>>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:23
> 2)
>>         at
>>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFi
> lterChain.java:235)
>>         at
>>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChai
> n.java:206)
>>         at
>>
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java
> :233)
>>         at
>>
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java
> :175)
>>         at
>>
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)

> >>         at
>> 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)

> >>         at
>> 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:1
> 09)
>>         at
>>
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
>>
> at
>> 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
>>
> at
>> 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11
> Protocol.java:583)
>>         at
>>
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
>>
> at java.lang.Thread.run(Thread.java:595)
>>
>> The latest version of the solr
> doesn't seem to like control characters
>> (\005, in this case), but previous
> versions handled them (or at least
>> ignored them).
>>
>> These characters
> shouldn't be in my documents, so there's a bug on my
>> end to track down.
> However, I'm wondering if this was an expected
>> change or an unintended
> consequence of recent work . . .
>>
>>
>>
>>
>> --
>>
> ------------------------------------------------------------------------------
> -------------------
>> Be who you are and say what you feel,
>> because those
> who mind don't matter and
>> those who matter don't mind.
>>  -- Dr.
> Seuss
>>
>>
>
> --
> View this message in context:
> http://www.nabble.com/problem-index-accented-character-with-release-version-of
> -solr-1.3-tp19544660p20914244.html
> Sent from the Solr - User mailing list
> archive at Nabble.com.
>
>



--
>
--------------------------------------------------------------
Peter M.
> Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
[EMAIL PROTECTED]


Reply via email to