Only a few control characters are legal in XML. Removing everthing
but newlines, space, and tab is the right thing to do. --wunder
On 12/9/08 5:45 AM, "Peter Wolanin" <[EMAIL PROTECTED]> wrote:
> We have been having this problem also. and have resorted to just
stripping
> control characters befor
We have been having this problem also. and have resorted to just
stripping control characters before sending the text for indexing:
preg_replace('@[\x00-\x08\x0B\x0C\x0E-\x1F]@', '', $text);
-Peter
On Tue, Dec 9, 2008 at 7:59 AM, knietzie <[EMAIL PROTECTED]> wrote:
>
> hi joshua,
>
> i'm having
hi joshua,
i'm having the same problem as yours.
just curious, have you found any fix for this?
thnks
Joshua Reedy wrote:
>
> I have been using a stable dev version of 1.3 for a few months.
> Today, I began testing the final release version, and I encountered a
> strange problem.
> The only t
From the XML 1.0 spec.: "Legal characters are tab, carriage return,
line feed, and the legal graphic characters of Unicode and ISO/IEC
10646." So, \005 is not a legal XML character. It appears the old StAX
implementation was more lenient than it should have been and Woodstox is
doing the corr
My guess is it has to do with switching the StAX implementation to
geronimo API and the woodstox implementation
https://issues.apache.org/jira/browse/SOLR-770
I'm not sure what the solution is though...
On Sep 17, 2008, at 10:02 PM, Joshua Reedy wrote:
I have been using a stable dev versio