Re: Unicode characters that are not legal XML characters;

2008-12-23 Thread lucas song
I have wirte a class to deal with this problem. public class XmlCharFilter { public static String doFilter(String in) { StringBuffer out = new StringBuffer(); // Used to hold the output. char current; // Used to reference the current character. if (in == null || ("".equals(in)))

Re: Unicode characters that are not legal XML characters

2008-12-23 Thread Bryan Talbot
I believe you can use the following unicode characters in XML documents: U+0009, U+000A, U+000D, [U+0020-U+D7FF], [U+E000-U+FFFD], and [U+1-U+10] One of your documents contains a U0022 character which is an invalid space character for XML. http://www.unicode.org/unicode/reports/tr

Re: Unicode characters that are not legal XML characters;

2008-12-23 Thread Jarek Zgoda
Wiadomość napisana w dniu 2008-12-23, o godz. 14:46, przez rohit arora: When i give post command to build my Index on my (databases / XML) file it gives me an error which is like . com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, code 22)) at [row,col {unknown-

RE: Unicode characters

2007-05-01 Thread HUYLEBROECK Jeremy RD-ILAB-SSF
Thanks a lot for the time you spent understanding my problem and checking for a solution in Neko! It helps a lot. -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Friday, April 27, 2007 4:02 PM To: solr-user@lucene.apache.org Subject: Re: Unicode characters

Re: Unicode characters

2007-04-27 Thread Chris Hostetter
: -fetch a web page : -decode entities and unicode characters(such as $#149; ) using Neko : library : -get a unicode String in Java : -Sent it to SOLR through XML created by SAX, with the right encoding : (UTF-8) specified everywhere( writer, header etc...) : -it apparently arrives clean on the SO

Re: Unicode characters

2007-04-27 Thread Yonik Seeley
On 4/27/07, HUYLEBROECK Jeremy RD-ILAB-SSF -In the query output from SOLR (XML message), the character is not encoded as an entity (not •) but the character itself is used (character 149=95 hexadecimal). That's fine, as they are equivalent representations, and that character is directly represe