I have wirte a class to deal with this problem. public class XmlCharFilter { public static String doFilter(String in) { StringBuffer out = new StringBuffer(); // Used to hold the output. char current; // Used to reference the current character. if (in == null || ("".equals(in))) return ""; // vacancy test. for (int i = 0; i < in.length(); i++) { current = in.charAt(i); // NOTE: No IndexOutOfBoundsException caught // here; it should not happen. if ((current == 0x9) || (current == 0xA) || (current == 0xD) || ((current >= 0x20) && (current <= 0xD7FF)) || ((current >= 0xE000) && (current <= 0xFFFD)) || ((current >= 0x10000) && (current <= 0x10FFFF))) out.append(current); } return out.toString(); }
} 2008/12/23 Jarek Zgoda <jarek.zg...@redefine.pl> > Wiadomość napisana w dniu 2008-12-23, o godz. 14:46, przez rohit arora: > > > When i give post command to build my Index on my (databases / XML) file it >> gives me >> an error which is like . >> >> com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character >> ((CTRL-CHAR, code 22)) >> at [row,col {unknown-source}]: [1676,86] >> >> I find a inbuild function in perl to convert all my character date in >> "UTF-8" format >> I find that there are many Unicode Character that are not legal XML >> Character. >> >> Can any one help me to find the list of all the legal XML Character so >> that >> I can strip all character except those characters. >> > > > http://en.wikipedia.org/wiki/Unicode_control_characters > > Basically, anything from 0 to 31 + DEL character (127). > > -- > We read Knuth so you don't have to. - Tim Peters > > Jarek Zgoda, R&D, Redefine > jarek.zg...@redefine.pl > >