A question about bash change for efficiency speedups in multibyte locales

2018-03-29 Thread yangyajing
Hello,  I recently did lsb-test-core on ubuntu 16.04. There is a test
that can be successful in bash 4.1 but fails in bash 4.3.

The test comment is :  bash tp10.sh.

The content of tp10.sh is :   echo [[:xdigit:]] [[:hyphen:]].

And this test sets a multibyte locale that defines a charclass
containing ''hyphen''.

When I run the tp10.sh script with bash, I find that ''hyphen'' can be
recognized in bash4.1 but not in bash4.3.

By comparing the source code, I found that compared to bash4.1, there
are 2 more lines of code in bash4.3, which in the xstrmatch() function
in bash/lib/glob/smatch.c.
The two lines of code are as follows:


When I checked the change log at
https://tiswww.case.edu/php/chet/bash/CHANGES, I guess this change may
be for efficiency speedups.

So I want to confirm with you whether my guess is correct, and whether
this change ignores the fact that pattern does not have a multibyte
character but is defined in the charclass in multibyte locales.

Looking forward to your reply.





Re: A question about bash change for efficiency speedups in multibyte locales

2018-04-03 Thread yangyajing
I don't think it should check for '[:',  but it should check if the
pattern is defined in the charclass even if the pattern does not have a
multibyte character.


when the pattern is one of the character classes defined in the POSIX
standard such as xdigit, these two lines of code can indeed speed up
code efficiency.


But I guess it ignored the fact that pattern does not have a multibyte
character but is defined in the charclass in multibyte locales.


Because when there is no these two lines of code in version 4.1 of bash,
the return value of the internal_wstrmatch (wpattern, wstring, flags)
function is correct. 

This is because wctype() used in is_wcclass() called by
internal_wstrmatch() can recognize the charclass defined in locale.


However, when these two lines of code are added, the return value of the
internal_strmatch() function in this case is incorrect.

Because internal_strmatch() calls is_class(), and is_class() only
considers the fact that character classes defined in the POSIX standard.




在 2018年04月03日 23:28, Chet Ramey 写道:
> On 4/2/18 10:59 PM, yangyajing wrote:
>> Thanks for your reply.
>>
>> The two lines of code are as follows which in the xstrmatch() function in 
>> bash-4.3/lib/glob/smatch.c:
>>
>>   if (mbsmbchar (string) == 0 && mbsmbchar (pattern) == 0)
>>     return (internal_strmatch ((unsigned char *)pattern, (unsigned char
>> *)string, flags));
> Interesting. Do you think it should check for `[:' and not take the
> single-byte character path if that's present?
>
> Chet
>