Re: [A-Z], [:upper:]

Greg Wooledge Fri, 29 Mar 2019 05:43:02 -0700

>   | [A-Z] isn't safe to use unless ...
> 
> That's true to an extent, but we know here that the intent is to
> match 'C' which is between A and Z in every locale in the universe.
> Variations on A might not be, variations on Z might not be, and there
> might be more than just the upper case English letters between A and Z,
> included in the ragne (even including things which are not letters at
> all, upper case or not, and lower case chars might be included) but we
> can assume that for any real locale, 'C' will be in that range (real as
> being one in use in the world, rather than one invented for the very
> purpose of not including C in the collating sequence between A and Z)


So, embracing and extending your assumptions, we can also claim that
the letter T is between A and Z in every locale in the universe, right?

wooledg:~$ printf %s\\n {A..Z} | LC_COLLATE=et_EE.utf8 sort | tr '\n' ' '
A B C D E F G H I J K L M N O P Q R S Z T U V W X Y 

Isn't real life FUN?

But perhaps you're right about the letter C specifically.  Maybe that
one letter just happens to lie between A and Z in every locale on Earth.
I don't happen to know of any counter-examples... yet.

Now, for the original poster: the meaning of [A-Z] and [a-z] did in
fact change between bash 4 and bash 5.

wooledg:~$ bash-4.4 -c 'LC_COLLATE=et_EE.utf8; [[ T = [A-Z] ]] && echo match'
wooledg:~$ bash-5.0 -c 'LC_COLLATE=et_EE.utf8; [[ T = [A-Z] ]] && echo match'
match

This is yet one more reason you can't rely on [A-Z] or [a-z] to work
as expected in scripts.  Even between different versions of bash, within
the same locale, on the same computer, it doesn't behave consistently.

I strongly recommend switching to [[:upper:]] and friends, unless you
always work in the C locale (and explicitly set it in your scripts).

Re: [A-Z], [:upper:]

Reply via email to