Package: coreutils
Version: 8.4-1
Severity: normal
File: /usr/bin/uniq

~$ locale charmap
UTF-8
~$ locale collate-codeset
UTF-8
~$ sort .zsh-history|uniq -D|sed -n l
cd Pyr\202n\202es$
cd Pyr\351n\351es$


Both lines are identical except for the invalid UTF-8
characters, uniq reports them as identical.

"sort -u" and "comm" also treat them as identical:
~$ echo '\0300\n\0301' | sort -u | sed -n l
\300$
~$ sed -n l a
cd Pyr\202n\202es$
~$ sed -n l b
cd Pyr\351n\351es$
~$ comm -12 a b | sed -n l
cd Pyr\351n\351es$

If that's an expected behavior, I think it should be better
documented as I think "Comparisons honor the rules specified
by the `LC_COLLATE' locale category." is not enough to cover
that rather unintuitive behavior.

-- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (50, 'experimental')
Architecture: i386 (i686)

Kernel: Linux 2.6.32-trunk-686 (SMP w/1 CPU core)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_US.ISO-8859-15 (charmap=UTF-8) (ignored: 
LC_ALL set to en_US.UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages coreutils depends on:
ii  libacl1                       2.2.49-1   Access control list shared library
ii  libc6                         2.10.2-5   Embedded GNU C Library: Shared lib
ii  libselinux1                   2.0.89-4   SELinux runtime shared libraries

coreutils recommends no packages.

coreutils suggests no packages.

-- debconf-show failed



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to