Re: [PATCH v3] wildmatch: properly fold case everywhere

Duy Nguyen Wed, 29 May 2013 06:52:43 -0700

On Wed, May 29, 2013 at 8:37 PM, Anthony Ramine <[email protected]> wrote:
> Le 29 mai 2013 à 15:22, Duy Nguyen a écrit :
>
>> On Tue, May 28, 2013 at 8:58 PM, Anthony Ramine <[email protected]> wrote:
>>> Case folding is not done correctly when matching against the [:upper:]
>>> character class and uppercased character ranges (e.g. A-Z).
>>> Specifically, an uppercase letter fails to match against any of them
>>> when case folding is requested because plain characters in the pattern
>>> and the whole string and preemptively lowercased to handle the base case
>>> fast.
>>
>> I did a little test with glibc fnmatch and also checked the source
>> code. I don't think 'a' matches [:upper:]. So I'm not sure if that's a
>> correct behavior or a bug in glibc. The spec is not clear (I think) on
>> this. I guess we should just assume that 'a' should match '[:upper:]'?
>
> I don't know, in my opinion if case folding is enabled we should say 
> [:upper:], [:lower:] and [:alpha:] are equivalent.
>
> This opinion is shared by GNU Flex [1]:
>
>>       • If your scanner is case-insensitive (the ‘-i’ flag), then 
>> ‘[:upper:]’ and ‘[:lower:]’ are equivalent to ‘[:alpha:]’.
>
> [1] http://flex.sourceforge.net/manual/Patterns.html


Then we should do it too because of this precedent, I think.

>>> @@ -196,6 +196,11 @@ static int dowild(const uchar *p, const uchar *text, 
>>> unsigned int flags)
>>>                                        }
>>>                                        if (t_ch <= p_ch && t_ch >= prev_ch)
>>>                                                matched = 1;
>>> +                                       else if ((flags & WM_CASEFOLD) && 
>>> ISLOWER(t_ch)) {
>>> +                                               uchar t_ch_upper = 
>>> toupper(t_ch);
>>> +                                               if (t_ch_upper <= p_ch && 
>>> t_ch_upper >= prev_ch)
>>> +                                                       matched = 1;
>>> +                                       }
>>
>> Or we could stick with to tolower. Something like this
>>
>> if ((t_ch <= p_ch && t_ch >= prev_ch) ||
>>   ((flags & WM_CASEFOLD) &&
>>      t_ch <= tolower(p_ch) && t_ch >= tolower(prev_ch)))
>>   match = 1;
>>
>> I think it's easier to read if we either downcase all, or upcase all, not 
>> both.
>
> If the range to match against is [A-_], it will become [a-_] which is an 
> empty range, ord('a') > ord('_'). I think it is simpler to reuse toupper() 
> after the fact as I did.
>
> Anyway maybe I should add a test for that corner case?

Yeah I was thinking about such a case, but I saw glibc do it... I
guess we just found another bug, at least in compat/fnmatch.c. Yes a
test for it would be great, in case I change my mind 2 years from now
and decide to turn it the other way ;)

>
>>>                                        p_ch = 0; /* This makes "prev_ch" 
>>> get set to 0. */
>>>                                } else if (p_ch == '[' && p[1] == ':') {
>>>                                        const uchar *s;
>>> @@ -245,6 +250,8 @@ static int dowild(const uchar *p, const uchar *text, 
>>> unsigned int flags)
>>>                                        } else if (CC_EQ(s,i, "upper")) {
>>>                                                if (ISUPPER(t_ch))
>>>                                                        matched = 1;
>>> +                                               else if ((flags & 
>>> WM_CASEFOLD) && ISLOWER(t_ch))
>>> +                                                       matched = 1;
>>>                                        } else if (CC_EQ(s,i, "xdigit")) {
>>>                                                if (ISXDIGIT(t_ch))
>>>                                                        matched = 1;
>>
>> If WM_CASEFOLD is set, maybe isalpha(t_ch) is enough then?
>
> Yes isalpha() is enought but I wanted to keep the two cases separated, I can 
> amend that if you want.

Either way is fine. I don't think this code is performance critical. Your call.
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3] wildmatch: properly fold case everywhere

Reply via email to