found 466341 5.10.0-19 retitle 466341 support the Encode::decode CHECK argument with ISO-2022-JP severity 466341 wishlist thanks
On Mon, Feb 18, 2008 at 01:36:55AM -0500, Bryan Donlan wrote: > Package: perl > Version: 5.8.8-12 > Severity: normal > > Converting a certain sequence of ISO-2022-JP text to utf8 succeeds: > $ perl -MEncode -e '$s= "{\x1b\x24\x42\x2d)\x1b(B}"; print > encode("utf8", decode("iso-2022-jp", $s, Encode::FB_CROAK)), "\n"' > {⑨} > > However, converting it back to ISO-2022-JP fails: > $ perl -MEncode -e '$s= "{\x1b\x24\x42\x2d)\x1b(B}"; print > encode("iso-2022-jp", decode("iso-2022-jp", $s, Encode::FB_CROAK)), > "\n"' > {\x{2468}} > > It should be noted that iconv rejects this entirely: > $ perl -MEncode -e '$s= "{\x1b\x24\x42\x2d)\x1b(B}"; print $s, > "\n"'|iconv -f iso-2022-jp -t utf8 > {iconv: illegal input sequence at position 4 > > However, if this is truly invalid iso-2022-jp, perl should croak on it, since > FB_CROAK was passed. It's indeed an invalid sequence, iconv is right about that. The original JIS-C-6226 (aka. JIS X 0208) standard can be found at e.g. [1], and it does not contain 0x2d 0x29, which is the sequence embedded in your iso-2022-jp coded example. The bug here seems to be that the corresponding Encode module ignores the CHECK argument. The Encode documentation states: NOTE: Not all encoding support this feature Some encodings ignore CHECK argument. For example, Encode::Unicode ignores CHECK and it always croaks on error. so lowering the severity. [1] http://www.itscj.ipsj.or.jp/ISO-IR/087.pdf Cheers, -- Niko Tyni nt...@debian.org -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org