Re: locale specific ordering in EN_US -- why is a

Aharon Robbins Mon, 21 May 2012 12:30:34 -0700

In article <[email protected]>,
Linda Walsh  <[email protected]> wrote:
>Greg Wooledge wrote:
>
>> On Sun, May 20, 2012 at 11:36:35AM -0700, Linda Walsh wrote:
>
>> For instance, on HP-UX 10.20, in the en_US.iso88591 locale:
>>     A  a  ...  B  b
>> Meanwhile, on Debian 6.0, in the en_US.iso88591 locale:
>>     a A   ...  b B
>> 
>> As you can see, the two en_US.iso88591 implementations are not the same.
>
>----
>       Great!...
>
>So which is correct?


Both!  Isn't this fun!  Current POSIX leaves this up to the implementation.
I believe that the Debian order is what earlier POSIX required.

>Anyone wanting to reference an upper or lower case range
>[a-z] or [A-Z], is gonna hurt from this.

This is why I started the Campaign For Rational Range Interpretation,
now part of gawk and I believe in the most recent grep also, which
returns us to the sane days of yesteryear, where [a-z] got only lowercase
letters and [A-Z] got only uppercase ones.

>My OS uses "en_US.UTF-8".

I personally have had

        export LC_ALL=C

in my .profile / .bashrc for many years now, to keep the behavior G-d
intended.

>You'd think unicode would have something to say about collation
>order that wouldn't allow such randomness, but maybe not.

It actually makes sense that it doesn't, since Unicode is more or less
a mapping of code points to glyphs, which is language independant. The
rules for collating depend upon the language.
-- 
Aharon (Arnold) Robbins                         arnold AT skeeve DOT com
P.O. Box 354            Home Phone: +972  8 979-0381
Nof Ayalon              Cell Phone: +972 50 729-7545
D.N. Shimshon 99785     ISRAEL

Re: locale specific ordering in EN_US -- why is a

Reply via email to