Re: Bash vi mode's e command (end of word) goes to eol when hitting a unicode character

2018-09-05 Thread Chet Ramey
On 9/3/18 7:13 AM, Enrico Maria De Angelis wrote:
> This is kind of a pedantic bug report.
> Basically it seems that bash's vi-mode doesn't use the same definition of
> words/Words/... that Vim uses (which is the facto the always installed
> version of vi), but I write you the same, just in case it's an easy task to
> do the fix (if you think this is really a bug).

Thanks for the report. The readline vi-mode code needs to be updated to
better handle multibyte characters in a few places; this is one.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



built-in regex matches wrong character

2018-09-05 Thread mamatb
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='x86_64' 
-DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='x86_64-unknown-linux-gnu' 
-DCONF_VENDOR='unknown' -DLOCALEDIR='/usr/local/share/locale' -DPACKAGE='bash' 
-DSHELL -DHAVE_CONFIG_H   -I.  -I. -I./include -I./lib   -g -O2 
-Wno-parentheses -Wno-format-security
uname output: Linux mamatb-laptop 4.4.0-98-generic #121-Ubuntu SMP Tue Oct 10 
14:24:03 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Machine Type: x86_64-unknown-linux-gnu

Bash Version: 4.4
Patch Level: 0
Release Status: release

Description:
It seems like bash built-in regex matches some symbols that shouldn't. 
The following commands shows this:
[[ 'º' =~ [o-p] ]] && [[ ! 'º' =~ o ]] && [[ ! 'º' =~ p ]] 
&& echo 'º between o and p but none of them'
[[ 'ª' =~ [a-b] ]] && [[ ! 'ª' =~ a ]] && [[ ! 'ª' =~ b ]] 
&& echo 'ª between a and b but none of them'

Repeat-By:
Actually found out this while developing a bigger bash script, but it 
can be reproduced with the previous lines. Would you reply me at 
amatba...@gmail.com to know if this was in fact a bug? Thanks.



Re: built-in regex matches wrong character

2018-09-05 Thread Eric Blake

On 09/05/2018 01:50 PM, mamatb@mamatb-laptop wrote:


Description:
It seems like bash built-in regex matches some symbols that shouldn't. 
The following commands shows this:
[[ 'º' =~ [o-p] ]] && [[ ! 'º' =~ o ]] && [[ ! 'º' =~ p ]] && 
echo 'º between o and p but none of them'
[[ 'ª' =~ [a-b] ]] && [[ ! 'ª' =~ a ]] && [[ ! 'ª' =~ b ]] && 
echo 'ª between a and b but none of them'

Repeat-By:
Actually found out this while developing a bigger bash script, but it 
can be reproduced with the previous lines. Would you reply me at 
amatba...@gmail.com to know if this was in fact a bug? Thanks.


Not a bug, but a property of your locale.

POSIX says that range expressions in regular expressions are 
implementation-defined except for in the C locale, which means [a-b] is 
free to match more than just the two ASCII characters 'a' and 'b', but 
rather anything that your current locale considers equivalent.


If you run your script with LC_ALL=C in the environment, you won't have 
that problem (because there, [a-b] is well-defined to be exactly two 
characters).  Or, you can use bash's 'shopt -s globasciiranges' which is 
supposed to enable Rational Range Interpretation, where even in non-C 
locales, a character range bounded by two ASCII characters takes on the 
C locale definition of only the ASCII characters in that range, rather 
than the locale's definition of whatever other characters might also be 
equivalent (actually, while I know that shopt affects globbing, I don't 
know if it also affects regex matching - but if it doesn't, that's 
probably a bug that should be fixed).


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: built-in regex matches wrong character

2018-09-05 Thread Miguel Amat
Thanks for your response Eric, please find my attached screenshot
testing both solutions. Seems like setting LC_ALL=C in the environment
works fine while 'shopt -s globasciiranges' does not (also I could be
testing this the wrong way, first time using shopt).

Regards,
Miguel

On 9/5/18, Eric Blake  wrote:
> On 09/05/2018 01:50 PM, mamatb@mamatb-laptop wrote:
>
>> Description:
>>  It seems like bash built-in regex matches some symbols that shouldn't.
>> The following commands shows this:
>>  [[ 'º' =~ [o-p] ]] && [[ ! 'º' =~ o ]] && [[ ! 'º' =~ p ]] && 
>> echo 'º
>> between o and p but none of them'
>>  [[ 'ª' =~ [a-b] ]] && [[ ! 'ª' =~ a ]] && [[ ! 'ª' =~ b ]] && 
>> echo 'ª
>> between a and b but none of them'
>>
>> Repeat-By:
>>  Actually found out this while developing a bigger bash script, but it 
>> can
>> be reproduced with the previous lines. Would you reply me at
>> amatba...@gmail.com to know if this was in fact a bug? Thanks.
>
> Not a bug, but a property of your locale.
>
> POSIX says that range expressions in regular expressions are
> implementation-defined except for in the C locale, which means [a-b] is
> free to match more than just the two ASCII characters 'a' and 'b', but
> rather anything that your current locale considers equivalent.
>
> If you run your script with LC_ALL=C in the environment, you won't have
> that problem (because there, [a-b] is well-defined to be exactly two
> characters).  Or, you can use bash's 'shopt -s globasciiranges' which is
> supposed to enable Rational Range Interpretation, where even in non-C
> locales, a character range bounded by two ASCII characters takes on the
> C locale definition of only the ASCII characters in that range, rather
> than the locale's definition of whatever other characters might also be
> equivalent (actually, while I know that shopt affects globbing, I don't
> know if it also affects regex matching - but if it doesn't, that's
> probably a bug that should be fixed).
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
>