ellis added a comment.

In D154014#4462886 <https://reviews.llvm.org/D154014#4462886>, @MaskRay wrote:

> In D154014#4461076 <https://reviews.llvm.org/D154014#4461076>, @rupprecht 
> wrote:
>
>> In D154014#4457883 <https://reviews.llvm.org/D154014#4457883>, @MaskRay 
>> wrote:
>>
>>>> This is a breaking change since some SCLs might use .* or (abc|def) which 
>>>> are supported regexes but not valid globs. Since we have just cut clang 
>>>> 16.x this is a good time to make this change.
>>>
>>> My user has some ignore lists, but there is no `^[a-z]+:.*\(` or 
>>> `^[a-z]+:\.` occurrence, so this change is likely safe for us.
>>
>> I think I'm looking at the same lists, and I see plenty of `.*` in the 
>> sanitizer exclusion lists. Also a few cases of `(abc|def|...)`. IIUC, those 
>> would both be broken by this -- instead of `.*` meaning "any character any 
>> number of times" (regex) it would mean "dot followed by any number of 
>> characters" (glob), right? And the `(abc|...)` would just be that literal 
>> text, not matching the individual parts?
>
> Sorry that my regex use was incorrect. We do have some `src:x/a.pb.*` style 
> rules, which are technically misused. `src:x/a.pb.*` can match possibly 
> unintended files like `x/axpbx`.
> To match just glob `x/a.pb.*` files, the rule should be written as 
> `src:x/a\.pb\.*` (`*` instead of `.*`), but I doubt anyone writes rules using 
> `\.`.
>
>> If my understanding of that is correct, I don't think this is a good change 
>> -- there's possibly plenty of configs out there that assume this is regex, 
>> and there doesn't seem to be sufficient motivation to just break those. But 
>> I can see that globs are useful for the examples posted in the patch 
>> description. Is it possible to have some middle ground, e.g. default to 
>> regex but allow a config at the top of sanitizer lists to interpret future 
>> patterns as globs instead?
>
> I think the main breakages are:
>
> - patterns using `(` and `)`. Our users don't have such patterns.
> - patterns using `\.` to mean a literal `.`, e.g. `src:a\.c`. Our users don't 
> use `\`. Such a pattern needs to changed to `src:a.c` (which used to match 
> other files like `axc`)
> - patterns like `src:a.pb.*` that match other files like `src:axpbx.c`. This 
> pattern can be changed to glob `src:a?pb?*`.
>
> I think all of these are likely uncommon. While `\.` is technically the 
> correct way to match a literal `.` before this patch, most people likely just 
> use `.`.
>
> In summary I think this patch is moving toward the right direction and making 
> the patterns less error-prone.

Actually, the code 
<https://github.com/llvm/llvm-project/blob/0e93c4a5fd5825a16fd3c00f9f7942ec88412771/llvm/lib/Support/GlobPattern.cpp#L94-L103>
 will escape any character after `\`, not just special characters, so that 
`a\\` will match `a\`, `a\.` will match `a.`, and `a\a` will match `aa` (I'll 
have to make sure there are test cases for this). You are right that `.` is no 
longer a special character, which I believe is a major improvement.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154014/new/

https://reviews.llvm.org/D154014

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to