On Wed, Feb 16, 2011 at 01:01:07AM +0100, Vincent Lefevre wrote: > On 2011-02-14 16:43:11 +0000, Ian Jackson wrote: > > When LC_CTYPE=en_GB.utf-8, programs which attempt to print unicode > > characters to stdout should use UTF-8. That's what LC_TYPE means. > > So, "cat", "grep", etc. are all broken. :)
How come? "cat" will, for any valid UTF-8 character on input, print a valid UTF-8 character on output. For any valid ISO-8859-1 character on input, it will print a valid ISO-8859-1 character on output. "grep" on the other hand has to actually understand the encoding -- and it does. Try this: $ echo "ą"|LC_CTYPE=C grep --color=always . Will be mangled. $ echo "ą"|LC_CTYPE=en_US.utf-8 grep --color=always . Will be handled correctly. -- 1KB // Microsoft corollary to Hanlon's razor: // Never attribute to stupidity what can be // adequately explained by malice. -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20110216003451.ga14...@angband.pl