On Fri, Sep 14, 2012 at 1:49 AM, Marcel Giannelia <i...@skeena.net> wrote: > I believe I've found an inconsistency in bash or its documentation. > > I know the fact that things like [a-c] are highly locale-dependent in > bash (doesn't mean I have to like it, but there it is). Fine. I've > learned to live with it. > > But the other day I was on a fresh install (hadn't set > LC_COLLATE=C yet, so I was in en_US.UTF-8), and this happened: > > $ touch {a..c} > $ ls > a b c > $ touch {A..C} > $ ls > a A b B c C > $ ls {a..c} > a b c > $ ls [a-c] > a A b B c > > Curly brace range expressions behave differently from square-bracket > ranges. Is this intentional? This is under Arch Linux, bash version > "4.2.37(2)-release (i686-pc-linux-gnu)". > > The man page seems to imply that the curly brace behaviour above is a > bug: > > "When characters are supplied, the expression expands to each character > lexicographically between x and y, inclusive." > > ...although this documentation suffers from the same problem as the > passage about character class ranges, namely that it confuses > lexicographic sort order (character collation *weights*) with > character collation *sequence values* (they are not quite the same thing > -- if they were, 'c' and 'C' would *always always always* appear > together in a range expansion, because: > $ touch aa B cd C > $ ls -1 > aa > B > C > cd > ). The phrases "sorts between" and "lexicographically between" refer to > collation *weights*, but bash clearly uses sequence values. > > It's a subtle distinction; I beat it to death in a thread > from 2011, subject "documentation bug re character range expressions", > but I don't think the documentation actually got changed. > > It seems the thinking goes something like, "since no one is supposed to > use expressions like [a-c], we don't have precisely > document, care, or even *know* what it means" -- a shame, because with > LC_COLLATE=C set, [a-c] is actually quite useful, and in all other > locales it isn't useful at all (it would be slightly useful if it used > weights like the documentation says because then it would be like a > case-insensitive range, but with it using sequence values instead, it's > useless). > > The sheer number of threads we've got complaining about > locale-dependent [a-c] suggests to me that the software should be > changed to just do what people expect, especially since nothing is > really lost by doing so. > > Oh well. Dead horses and all that -- but can we at least make the dead > horses consistent? :) > > ~Felix. >
http://mywiki.wooledge.org/locale