Re: Locales/sort bug

2010-11-05 Thread Camaleón
On Fri, 05 Nov 2010 00:36:47 +0100, David Jardine wrote: > On Thu, Nov 04, 2010 at 10:55:53PM +, Camaleón wrote: (...) >> Heck, it's even weirder with this sequence: >> >> aph3,"z >> aph3_devel,"a >> aph3,"b >> >> I gets sorted as: >> >> aph3,"b >> aph3_devel,"a >> aph3,"z >> >> I'm tryi

Re: Locales/sort bug

2010-11-04 Thread Bob Proulx
Camaleón wrote: > I'm trying to "reverse-engineering" the logic behind the sort but I can't > see it. Maybe it is done randomly? Very curious, indeed. It is "dictionary" sort ordering as specified by the locale. Case is folded and punctuation is (mostly) ignored. Personally I always set the fol

Re: Locales/sort bug

2010-11-04 Thread David Jardine
On Thu, Nov 04, 2010 at 10:55:53PM +, Camaleón wrote: > On Thu, 04 Nov 2010 21:23:27 +0100, Rob Gom wrote: > > > [cut] > >> > >> I'm also getting that behaviour (locale set to "es_ES.UTF-8") so I > >> understand that my locale setting dictates "underscore" ("_") comes > >> first than "comma" (

Re: Locales/sort bug

2010-11-04 Thread Camaleón
On Thu, 04 Nov 2010 21:23:27 +0100, Rob Gom wrote: > [cut] >> >> I'm also getting that behaviour (locale set to "es_ES.UTF-8") so I >> understand that my locale setting dictates "underscore" ("_") comes >> first than "comma" (",") symbol. >> >> As per "man sort" page: >> >> *** WARNING *** The loc

Re: Locales/sort bug

2010-11-04 Thread Rob Gom
I have some form of workaround. When I know sort field separator (which was the case in my original example), I can use that to overcome the limitations with: $ LC_ALL=pl_PL.UTF-8 sort -k1,1 -t',' test.csv aph3,"APP","" aph3,"MiB","" aph3_devel,"TXT","" # everything fine $ LC_ALL=pl_PL.UTF-8 sort

Re: Locales/sort bug

2010-11-04 Thread Rob Gom
One more thing. If I specify LC_COLLATE to C/POSIX, special characters sorting looks fine, but I lose Polish characters ordering. If I specify LC_COLLATE to pl_PL.UTF-8, Polish characters ordering is fine, but sorting goes crazy with special characters. Is it possible to retain both features then?

Re: Locales/sort bug

2010-11-04 Thread Rob Gom
[cut] > > This is covered by the coreutils FAQ: > http://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021 > > Sven > Thanks for all the answers. How could I know that collate is defined correctly? I understand LC_COLLATE influence on sort operation, but

Re: Locales/sort bug

2010-11-04 Thread Sven Joachim
On 2010-11-04 20:29 +0100, Rob Gom wrote: > Hi all, > do you think it's a bug in either libc or coreutils (sort)? > > $ cat test.csv > aph3,"APP","" > aph3_devel,"TXT","" > aph3,"MiB","" > > $ LC_ALL=C sort test.csv # expected > aph3,"APP","" > aph3,"MiB","" > aph3_devel,"TXT","" > > $ LC_ALL=pl_P

Re: Locales/sort bug

2010-11-04 Thread Rob Gom
[cut] > > I'm also getting that behaviour (locale set to "es_ES.UTF-8") so I > understand that my locale setting dictates "underscore" ("_") comes first > than "comma" (",") symbol. > > As per "man sort" page: > > *** WARNING *** The locale specified by the environment affects sort > order. Set LC_

Re: Locales/sort bug

2010-11-04 Thread Ron Johnson
On 11/04/2010 02:29 PM, Rob Gom wrote: Hi all, do you think it's a bug in either libc or coreutils (sort)? $ cat test.csv aph3,"APP","" aph3_devel,"TXT","" aph3,"MiB","" $ LC_ALL=C sort test.csv # expected aph3,"APP","" aph3,"MiB","" aph3_devel,"TXT","" $ LC_ALL=pl_PL sort test.csv # why is t

Re: Locales/sort bug

2010-11-04 Thread Camaleón
On Thu, 04 Nov 2010 20:29:02 +0100, Rob Gom wrote: > do you think it's a bug in either libc or coreutils (sort)? > > $ cat test.csv > aph3,"APP","" > aph3_devel,"TXT","" > aph3,"MiB","" > > $ LC_ALL=C sort test.csv # expected > aph3,"APP","" > aph3,"MiB","" > aph3_devel,"TXT","" > > $ LC_ALL=pl