I have wirte a class to deal with this problem.
public class XmlCharFilter {
public static String doFilter(String in) {
StringBuffer out = new StringBuffer(); // Used to hold the output.
char current; // Used to reference the current character.
if (in == null || ("".equals(in)))
return ""; // vacancy test.
for (int i = 0; i < in.length(); i++) {
current = in.charAt(i); // NOTE: No IndexOutOfBoundsException caught
// here; it should not happen.
if ((current == 0x9) || (current == 0xA) || (current == 0xD)
|| ((current >= 0x20) && (current <= 0xD7FF))
|| ((current >= 0xE000) && (current <= 0xFFFD))
|| ((current >= 0x10000) && (current <= 0x10FFFF)))
out.append(current);
}
return out.toString();
}
}
2008/12/23 Jarek Zgoda <[email protected]>
> Wiadomość napisana w dniu 2008-12-23, o godz. 14:46, przez rohit arora:
>
>
> When i give post command to build my Index on my (databases / XML) file it
>> gives me
>> an error which is like .
>>
>> com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character
>> ((CTRL-CHAR, code 22))
>> at [row,col {unknown-source}]: [1676,86]
>>
>> I find a inbuild function in perl to convert all my character date in
>> "UTF-8" format
>> I find that there are many Unicode Character that are not legal XML
>> Character.
>>
>> Can any one help me to find the list of all the legal XML Character so
>> that
>> I can strip all character except those characters.
>>
>
>
> http://en.wikipedia.org/wiki/Unicode_control_characters
>
> Basically, anything from 0 to 31 + DEL character (127).
>
> --
> We read Knuth so you don't have to. - Tim Peters
>
> Jarek Zgoda, R&D, Redefine
> [email protected]
>
>