Package: libxerces-c3.1 Source: xerces-c Version: 3.1.1-1 Description of problem:
If a huge file passed to XMLReader, it will call TransService mulitple times, and splite the file content into several fragments. Unfortunately, the fragment will contain incomplete multi-byte characters. But neither ICUTransService nor IconvGNUransService deal with it. ICUTransService did not deal with U_TRUNCATED_CHAR_FOUND, and IconvGNUransService did not deal with EINVAL. 2.7.0, 2.8.0, 3.0.1, 3.1.1 have the same bug. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: # compile the SAXPrint example of xerces-c. ]# ( echo '<?xml version="1.0" encoding="GBK" ?>'; echo '<data>'; for ((i=0;i<2;++i)); do echo -en '\xd6\xd0\xce\xc4\xba\xba\xd7\xd6A'; done ; echo; echo '</data>' ) > ~/small.xml ]# ( echo '<?xml version="1.0" encoding="GBK" ?>'; echo '<data>'; for ((i=0;i<100000;++i)); do echo -en '\xd6\xd0\xce\xc4\xba\xba\xd7\xd6A'; done ; echo; echo '</data>' ) > ~/big.xml # the small.xml and big.xml are analogical. ]# samples/SAXPrint ~/small.xml <?xml version="1.0" encoding="LATIN1"?> <data> 中文汉字A中文汉字A </data> # with icu ]# samples/SAXPrint ~/big.xml <?xml version="1.0" encoding="gbk"?> <data> Fatal Error at file /root/big.xml, line 3, char 16377 Message: char 0x6C49 is not representable in 'gbk' encoding # with iconvgnu ]# samples/SAXPrint ~/big.xml <?xml version="1.0" encoding="LATIN1"?> <data> Fatal Error at file /root/big.xml, line 3, char 16377 Message: invalid multi-byte sequence Regards Kirby Zhou -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org