I have wirte a class to deal with this problem.
public class XmlCharFilter {
public static String doFilter(String in) {
StringBuffer out = new StringBuffer(); // Used to hold the output.
char current; // Used to reference the current character.
if (in == null || ("".equals(in)))
I believe you can use the following unicode characters in XML
documents: U+0009, U+000A, U+000D, [U+0020-U+D7FF], [U+E000-U+FFFD],
and [U+1-U+10]
One of your documents contains a U0022 character which is an invalid
space character for XML.
http://www.unicode.org/unicode/reports/tr
Wiadomość napisana w dniu 2008-12-23, o godz. 14:46, przez rohit arora:
When i give post command to build my Index on my (databases / XML)
file it gives me
an error which is like .
com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character
((CTRL-CHAR, code 22))
at [row,col {unknown-
Thanks a lot for the time you spent understanding my problem and
checking for a solution in Neko!
It helps a lot.
-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED]
Sent: Friday, April 27, 2007 4:02 PM
To: solr-user@lucene.apache.org
Subject: Re: Unicode characters
: -fetch a web page
: -decode entities and unicode characters(such as $#149; ) using Neko
: library
: -get a unicode String in Java
: -Sent it to SOLR through XML created by SAX, with the right encoding
: (UTF-8) specified everywhere( writer, header etc...)
: -it apparently arrives clean on the SO
On 4/27/07, HUYLEBROECK Jeremy RD-ILAB-SSF
-In the query output from SOLR (XML message), the character is not
encoded as an entity (not •) but the character itself is used
(character 149=95 hexadecimal).
That's fine, as they are equivalent representations, and that
character is directly represe