Re: Bash vi mode's e command (end of word) goes to eol when hitting a unicode character
On 9/3/18 7:13 AM, Enrico Maria De Angelis wrote: > This is kind of a pedantic bug report. > Basically it seems that bash's vi-mode doesn't use the same definition of > words/Words/... that Vim uses (which is the facto the always installed > version of vi), but I write you the same, just in case it's an easy task to > do the fix (if you think this is really a bug). Thanks for the report. The readline vi-mode code needs to be updated to better handle multibyte characters in a few places; this is one. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
built-in regex matches wrong character
Configuration Information [Automatically generated, do not change]: Machine: x86_64 OS: linux-gnu Compiler: gcc Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='x86_64' -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='x86_64-unknown-linux-gnu' -DCONF_VENDOR='unknown' -DLOCALEDIR='/usr/local/share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H -I. -I. -I./include -I./lib -g -O2 -Wno-parentheses -Wno-format-security uname output: Linux mamatb-laptop 4.4.0-98-generic #121-Ubuntu SMP Tue Oct 10 14:24:03 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Machine Type: x86_64-unknown-linux-gnu Bash Version: 4.4 Patch Level: 0 Release Status: release Description: It seems like bash built-in regex matches some symbols that shouldn't. The following commands shows this: [[ 'º' =~ [o-p] ]] && [[ ! 'º' =~ o ]] && [[ ! 'º' =~ p ]] && echo 'º between o and p but none of them' [[ 'ª' =~ [a-b] ]] && [[ ! 'ª' =~ a ]] && [[ ! 'ª' =~ b ]] && echo 'ª between a and b but none of them' Repeat-By: Actually found out this while developing a bigger bash script, but it can be reproduced with the previous lines. Would you reply me at amatba...@gmail.com to know if this was in fact a bug? Thanks.
Re: built-in regex matches wrong character
On 09/05/2018 01:50 PM, mamatb@mamatb-laptop wrote: Description: It seems like bash built-in regex matches some symbols that shouldn't. The following commands shows this: [[ 'º' =~ [o-p] ]] && [[ ! 'º' =~ o ]] && [[ ! 'º' =~ p ]] && echo 'º between o and p but none of them' [[ 'ª' =~ [a-b] ]] && [[ ! 'ª' =~ a ]] && [[ ! 'ª' =~ b ]] && echo 'ª between a and b but none of them' Repeat-By: Actually found out this while developing a bigger bash script, but it can be reproduced with the previous lines. Would you reply me at amatba...@gmail.com to know if this was in fact a bug? Thanks. Not a bug, but a property of your locale. POSIX says that range expressions in regular expressions are implementation-defined except for in the C locale, which means [a-b] is free to match more than just the two ASCII characters 'a' and 'b', but rather anything that your current locale considers equivalent. If you run your script with LC_ALL=C in the environment, you won't have that problem (because there, [a-b] is well-defined to be exactly two characters). Or, you can use bash's 'shopt -s globasciiranges' which is supposed to enable Rational Range Interpretation, where even in non-C locales, a character range bounded by two ASCII characters takes on the C locale definition of only the ASCII characters in that range, rather than the locale's definition of whatever other characters might also be equivalent (actually, while I know that shopt affects globbing, I don't know if it also affects regex matching - but if it doesn't, that's probably a bug that should be fixed). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: built-in regex matches wrong character
Thanks for your response Eric, please find my attached screenshot testing both solutions. Seems like setting LC_ALL=C in the environment works fine while 'shopt -s globasciiranges' does not (also I could be testing this the wrong way, first time using shopt). Regards, Miguel On 9/5/18, Eric Blake wrote: > On 09/05/2018 01:50 PM, mamatb@mamatb-laptop wrote: > >> Description: >> It seems like bash built-in regex matches some symbols that shouldn't. >> The following commands shows this: >> [[ 'º' =~ [o-p] ]] && [[ ! 'º' =~ o ]] && [[ ! 'º' =~ p ]] && >> echo 'º >> between o and p but none of them' >> [[ 'ª' =~ [a-b] ]] && [[ ! 'ª' =~ a ]] && [[ ! 'ª' =~ b ]] && >> echo 'ª >> between a and b but none of them' >> >> Repeat-By: >> Actually found out this while developing a bigger bash script, but it >> can >> be reproduced with the previous lines. Would you reply me at >> amatba...@gmail.com to know if this was in fact a bug? Thanks. > > Not a bug, but a property of your locale. > > POSIX says that range expressions in regular expressions are > implementation-defined except for in the C locale, which means [a-b] is > free to match more than just the two ASCII characters 'a' and 'b', but > rather anything that your current locale considers equivalent. > > If you run your script with LC_ALL=C in the environment, you won't have > that problem (because there, [a-b] is well-defined to be exactly two > characters). Or, you can use bash's 'shopt -s globasciiranges' which is > supposed to enable Rational Range Interpretation, where even in non-C > locales, a character range bounded by two ASCII characters takes on the > C locale definition of only the ASCII characters in that range, rather > than the locale's definition of whatever other characters might also be > equivalent (actually, while I know that shopt affects globbing, I don't > know if it also affects regex matching - but if it doesn't, that's > probably a bug that should be fixed). > > -- > Eric Blake, Principal Software Engineer > Red Hat, Inc. +1-919-301-3266 > Virtualization: qemu.org | libvirt.org >