In article <[email protected]>,
Linda Walsh <[email protected]> wrote:
>Greg Wooledge wrote:
>
>> On Sun, May 20, 2012 at 11:36:35AM -0700, Linda Walsh wrote:
>
>> For instance, on HP-UX 10.20, in the en_US.iso88591 locale:
>> A a ... B b
>> Meanwhile, on Debian 6.0, in the en_US.iso88591 locale:
>> a A ... b B
>>
>> As you can see, the two en_US.iso88591 implementations are not the same.
>
>----
> Great!...
>
>So which is correct?
Both! Isn't this fun! Current POSIX leaves this up to the implementation.
I believe that the Debian order is what earlier POSIX required.
>Anyone wanting to reference an upper or lower case range
>[a-z] or [A-Z], is gonna hurt from this.
This is why I started the Campaign For Rational Range Interpretation,
now part of gawk and I believe in the most recent grep also, which
returns us to the sane days of yesteryear, where [a-z] got only lowercase
letters and [A-Z] got only uppercase ones.
>My OS uses "en_US.UTF-8".
I personally have had
export LC_ALL=C
in my .profile / .bashrc for many years now, to keep the behavior G-d
intended.
>You'd think unicode would have something to say about collation
>order that wouldn't allow such randomness, but maybe not.
It actually makes sense that it doesn't, since Unicode is more or less
a mapping of code points to glyphs, which is language independant. The
rules for collating depend upon the language.
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL