Re: problem index accented character with release version of solr 1.3

2008-12-09 Thread Walter Underwood
Only a few control characters are legal in XML. Removing everthing but newlines, space, and tab is the right thing to do. --wunder On 12/9/08 5:45 AM, "Peter Wolanin" <[EMAIL PROTECTED]> wrote: > We have been having this problem also. and have resorted to just stripping > control characters befor

Re: problem index accented character with release version of solr 1.3

2008-12-09 Thread Peter Wolanin
We have been having this problem also. and have resorted to just stripping control characters before sending the text for indexing: preg_replace('@[\x00-\x08\x0B\x0C\x0E-\x1F]@', '', $text); -Peter On Tue, Dec 9, 2008 at 7:59 AM, knietzie <[EMAIL PROTECTED]> wrote: > > hi joshua, > > i'm having

Re: problem index accented character with release version of solr 1.3

2008-12-09 Thread knietzie
hi joshua, i'm having the same problem as yours. just curious, have you found any fix for this? thnks Joshua Reedy wrote: > > I have been using a stable dev version of 1.3 for a few months. > Today, I began testing the final release version, and I encountered a > strange problem. > The only t

Re: problem index accented character with release version of solr 1.3

2008-09-18 Thread Sean Timm
From the XML 1.0 spec.: "Legal characters are tab, carriage return, line feed, and the legal graphic characters of Unicode and ISO/IEC 10646." So, \005 is not a legal XML character. It appears the old StAX implementation was more lenient than it should have been and Woodstox is doing the corr

Re: problem index accented character with release version of solr 1.3

2008-09-17 Thread Ryan McKinley
My guess is it has to do with switching the StAX implementation to geronimo API and the woodstox implementation https://issues.apache.org/jira/browse/SOLR-770 I'm not sure what the solution is though... On Sep 17, 2008, at 10:02 PM, Joshua Reedy wrote: I have been using a stable dev versio