I have wirte a class to deal with this problem.
public class XmlCharFilter {
    public static String doFilter(String in) {
    StringBuffer out = new StringBuffer(); // Used to hold the output.
    char current; // Used to reference the current character.
    if (in == null || ("".equals(in)))
        return ""; // vacancy test.
    for (int i = 0; i < in.length(); i++) {
        current = in.charAt(i); // NOTE: No IndexOutOfBoundsException caught
                                // here; it should not happen.
        if ((current == 0x9) || (current == 0xA) || (current == 0xD)
                || ((current >= 0x20) && (current <= 0xD7FF))
                || ((current >= 0xE000) && (current <= 0xFFFD))
                || ((current >= 0x10000) && (current <= 0x10FFFF)))
            out.append(current);
    }
    return out.toString();
}

}



2008/12/23 Jarek Zgoda <jarek.zg...@redefine.pl>

> Wiadomość napisana w dniu 2008-12-23, o godz. 14:46, przez rohit arora:
>
>
>  When i give post command to build my Index on my (databases / XML) file it
>> gives me
>> an error which is like .
>>
>> com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character
>> ((CTRL-CHAR, code 22))
>>  at [row,col {unknown-source}]: [1676,86]
>>
>> I find a inbuild function in perl to convert all my character date in
>> "UTF-8" format
>> I find that there are many Unicode Character that are not legal XML
>> Character.
>>
>> Can any one help me to find the list of all the legal XML Character so
>> that
>> I can strip all character except those characters.
>>
>
>
> http://en.wikipedia.org/wiki/Unicode_control_characters
>
> Basically, anything from 0 to 31 + DEL character (127).
>
> --
> We read Knuth so you don't have to. - Tim Peters
>
> Jarek Zgoda, R&D, Redefine
> jarek.zg...@redefine.pl
>
>

Reply via email to