On 09/05/2018 01:50 PM, mamatb@mamatb-laptop wrote:
Description:
It seems like bash built-in regex matches some symbols that shouldn't.
The following commands shows this:
[[ 'º' =~ [o-p] ]] && [[ ! 'º' =~ o ]] && [[ ! 'º' =~ p ]] &&
echo 'º between o and p but none of them'
[[ 'ª' =~ [a-b] ]] && [[ ! 'ª' =~ a ]] && [[ ! 'ª' =~ b ]] &&
echo 'ª between a and b but none of them'
Repeat-By:
Actually found out this while developing a bigger bash script, but it
can be reproduced with the previous lines. Would you reply me at
amatba...@gmail.com to know if this was in fact a bug? Thanks.
Not a bug, but a property of your locale.
POSIX says that range expressions in regular expressions are
implementation-defined except for in the C locale, which means [a-b] is
free to match more than just the two ASCII characters 'a' and 'b', but
rather anything that your current locale considers equivalent.
If you run your script with LC_ALL=C in the environment, you won't have
that problem (because there, [a-b] is well-defined to be exactly two
characters). Or, you can use bash's 'shopt -s globasciiranges' which is
supposed to enable Rational Range Interpretation, where even in non-C
locales, a character range bounded by two ASCII characters takes on the
C locale definition of only the ASCII characters in that range, rather
than the locale's definition of whatever other characters might also be
equivalent (actually, while I know that shopt affects globbing, I don't
know if it also affects regex matching - but if it doesn't, that's
probably a bug that should be fixed).
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org