On 19/02/2012 01:00, James Cloos wrote:
"KM" == Kerin Millar<kerfra...@gmail.com>  writes:

KM>  Arch also used to define LC_COLLATE="C" by default, probably to
KM>  mitigate unpredictable behaviour in some applications, but have
KM>  since dropped this additional variable so they must have deemed it
KM>  no longer necessary.

Without LC_COLLATE="C" things like [a-z]* gets a false=positive match
on files like Makefile.

Indeed, character classes are a potential minefield. Incidentally, I just tested Ubuntu and Arch with only LANG set to a UTF-8 locale:-

$ echo Makefile | sed -re 's/[a-z]//g' # collation rules ignored
M

$ echo Makefile | grep -Eo '[a-z]*' # collation rules ignored
akefile

In neither case are the collation rules being obeyed. In Gentoo, however:-

$ echo Makefile | sed -re 's/[a-z]//g' # collation rules obeyed

$ echo Makefile | grep -Eo '[a-z]*' # collation rules ignored
akefile

Obeying the collation rules is ostensibly the correct thing to do but, until everyone starts using named character classes (which will never happen), it's not safe. The thing that worries me here is the inconsistency in Gentoo. LC_COLLATE="C" is sufficient to work around the issue but the above makes me wonder why we still need it.

--Kerin


Reply via email to