Re: Bash glob range [0-5] in UTF-8 locale misses ¹, ² & ³

Grisha Levit Mon, 08 Sep 2025 02:01:19 -0700

Sorry, this section isn't quite right:

On Mon, Sep 8, 2025 at 2:25 AM Grisha Levit <[email protected]> wrote:
>
> So, in fact, locale-aware collation is disabled only if the range start
> and end codepoints are both in the range U+0001..U+00FF.  This doesn't
> make much sense for codepoints in the range U+0080..U+00FF.
>
> We should either:
>
>   * Remove the <= UCHAR_MAX checks (which would make the behavior match
>     the documentation)
>   * Replace the <= UCHAR_MAX checks with <= 0x7f checks (and update the
>     documentation to note that C locale-style comparisons are done only
>     if both ends of the range are ASCII characters)
>


Instead:

So, in fact, locale-aware collation is disabled only if the range boundary
and character being tested are both codepoints in the range U+0001..U+00FF.

This doesn't make much sense for codepoints in the range U+0080..U+00FF, so
the <= UCHAR_MAX check should be <=0x7f. (Note that invalid byte sequences
that do not form valid characters do not hit this code path)

Also, I'm not sure it makes much sense that with globasciiranges on, an
ASCII-only range like [0-5] still matches characters like U+2074 (as in
OP's example).

Also, the documentation suggests that C locale-style collation applies
to all ranges in globs, though the presence of "ascii" in the name makes
the intended effect unclear.

We could probably just remove the <= UCHAR_MAX checks (though this would
make the option more like "globcranges").

Alternatively, we could check that the _range_ ends are ASCII characters
(and, depending on the desired behavior, check the character being tested
as well) before disabling locale-aware collation.

Re: Bash glob range [0-5] in UTF-8 locale misses ¹, ² & ³

Reply via email to