Package: moreutils
Version: 0.31
Severity: normal

JIS code (ISO-2022-JP) is a 7 bit code system.  It uses sort-of valid characters
only from simple UTF-8 detection.  So it is not detected as non-utf-8 by
this isutf8 command.

If we flag text as non-UTF-8 just by HEX code 1B only, we may hit
problem if text contain terminal escape sequence.

But HEX sequence "1B 28 42" (to switch back to ASCII) makes it clear
this is not normal UTF-8 but ISO-2022.  At least, this is my way to
quickly check ISO-2022-JP.  Huristics of detecting ISO-2022-JP could
use this simple empirical check, I think.

Another consideration should use 
  "1B non-5B" ( "ESC non-[")
as indicator for IEC_2022 or non-UTF8-ness.  This looks more general.

As I understand, ANSI escapes are "ESC [" or "1B 5B".

This rule is safe even if ANSI escapes are used in UTF-8.

0x0F and 0x0E are also seem to be used in IEC_2022.  It could be
factored in too.  nkf command should give some guide to idea for the
huristics.  I do not see reason to have 0x0F and 0x0E and 0x1B except
for ANSI sequence exception.

FYI: http://en.wikipedia.org/wiki/ISO/IEC_2022

-- System Information:
Debian Release: 5.0.2
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.26-2-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages moreutils depends on:
ii  libc6                         2.7-18     GNU C Library: Shared libraries
ii  perl                          5.10.0-19  Larry Wall's Practical Extraction 

moreutils recommends no packages.

Versions of packages moreutils suggests:
pn  libtime-duration-perl         <none>     (no description available)
ii  libtimedate-perl              1.1600-9   Time and date functions for Perl

-- no debconf information



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to