Package: coreutils Version: 8.4-1 Severity: normal File: /usr/bin/uniq
~$ locale charmap UTF-8 ~$ locale collate-codeset UTF-8 ~$ sort .zsh-history|uniq -D|sed -n l cd Pyr\202n\202es$ cd Pyr\351n\351es$ Both lines are identical except for the invalid UTF-8 characters, uniq reports them as identical. "sort -u" and "comm" also treat them as identical: ~$ echo '\0300\n\0301' | sort -u | sed -n l \300$ ~$ sed -n l a cd Pyr\202n\202es$ ~$ sed -n l b cd Pyr\351n\351es$ ~$ comm -12 a b | sed -n l cd Pyr\351n\351es$ If that's an expected behavior, I think it should be better documented as I think "Comparisons honor the rules specified by the `LC_COLLATE' locale category." is not enough to cover that rather unintuitive behavior. -- System Information: Debian Release: squeeze/sid APT prefers unstable APT policy: (500, 'unstable'), (50, 'experimental') Architecture: i386 (i686) Kernel: Linux 2.6.32-trunk-686 (SMP w/1 CPU core) Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_US.ISO-8859-15 (charmap=UTF-8) (ignored: LC_ALL set to en_US.UTF-8) Shell: /bin/sh linked to /bin/bash Versions of packages coreutils depends on: ii libacl1 2.2.49-1 Access control list shared library ii libc6 2.10.2-5 Embedded GNU C Library: Shared lib ii libselinux1 2.0.89-4 SELinux runtime shared libraries coreutils recommends no packages. coreutils suggests no packages. -- debconf-show failed -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org