Package: libxerces-c3.1
Source: xerces-c
Version: 3.1.1-1

Description of problem:

If a huge file passed to XMLReader, it will call TransService mulitple
times,
and splite the file content into several fragments. 
Unfortunately, the fragment will contain incomplete multi-byte characters. 
But neither ICUTransService nor IconvGNUransService deal with it.
ICUTransService did not deal with U_TRUNCATED_CHAR_FOUND, and
IconvGNUransService did not deal with EINVAL. 

2.7.0, 2.8.0, 3.0.1, 3.1.1 have the same bug. 

Version-Release number of selected component (if applicable):


How reproducible:

100%

Steps to Reproduce:

# compile the SAXPrint example of xerces-c.

]# ( echo '<?xml version="1.0" encoding="GBK" ?>'; echo '<data>'; for
((i=0;i<2;++i)); do echo -en '\xd6\xd0\xce\xc4\xba\xba\xd7\xd6A'; done ;
echo; echo '</data>' ) > ~/small.xml

]# ( echo '<?xml version="1.0" encoding="GBK" ?>'; echo '<data>'; for
((i=0;i<100000;++i)); do echo -en '\xd6\xd0\xce\xc4\xba\xba\xd7\xd6A'; done
; echo; echo '</data>' ) > ~/big.xml 

# the small.xml and big.xml are analogical. 

]# samples/SAXPrint ~/small.xml 
<?xml version="1.0" encoding="LATIN1"?>
<data>
&#x4e2D;&#x6587;&#x6C49;&#x5B57;A&#x4e2D;&#x6587;&#x6C49;&#x5B57;A
</data>

# with icu 
]# samples/SAXPrint ~/big.xml 
<?xml version="1.0" encoding="gbk"?> 
<data> 
Fatal Error at file /root/big.xml, line 3, char 16377 
  Message: char 0x6C49 is not representable in 'gbk' encoding

# with iconvgnu 
]# samples/SAXPrint ~/big.xml 
<?xml version="1.0" encoding="LATIN1"?>
<data>
Fatal Error at file /root/big.xml, line 3, char 16377 
  Message: invalid multi-byte sequence

  Regards
  Kirby Zhou






-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to