Package: moreutils Version: 0.31 Severity: normal JIS code (ISO-2022-JP) is a 7 bit code system. It uses sort-of valid characters only from simple UTF-8 detection. So it is not detected as non-utf-8 by this isutf8 command.
If we flag text as non-UTF-8 just by HEX code 1B only, we may hit problem if text contain terminal escape sequence. But HEX sequence "1B 28 42" (to switch back to ASCII) makes it clear this is not normal UTF-8 but ISO-2022. At least, this is my way to quickly check ISO-2022-JP. Huristics of detecting ISO-2022-JP could use this simple empirical check, I think. Another consideration should use "1B non-5B" ( "ESC non-[") as indicator for IEC_2022 or non-UTF8-ness. This looks more general. As I understand, ANSI escapes are "ESC [" or "1B 5B". This rule is safe even if ANSI escapes are used in UTF-8. 0x0F and 0x0E are also seem to be used in IEC_2022. It could be factored in too. nkf command should give some guide to idea for the huristics. I do not see reason to have 0x0F and 0x0E and 0x1B except for ANSI sequence exception. FYI: http://en.wikipedia.org/wiki/ISO/IEC_2022 -- System Information: Debian Release: 5.0.2 APT prefers stable APT policy: (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 2.6.26-2-amd64 (SMP w/2 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash Versions of packages moreutils depends on: ii libc6 2.7-18 GNU C Library: Shared libraries ii perl 5.10.0-19 Larry Wall's Practical Extraction moreutils recommends no packages. Versions of packages moreutils suggests: pn libtime-duration-perl <none> (no description available) ii libtimedate-perl 1.1600-9 Time and date functions for Perl -- no debconf information -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org