Re: Bash glob range [0-5] in UTF-8 locale misses ¹, ² & ³

Grisha Levit Mon, 08 Sep 2025 12:51:24 -0700

On Mon, Sep 8, 2025, 13:28 Chet Ramey <[email protected]> wrote:

> On 9/8/25 4:59 AM, Grisha Levit wrote:
>
> > So, in fact, locale-aware collation is disabled only if the range
> boundary
> > and character being tested are both codepoints in the range
> U+0001..U+00FF.
> >
> > This doesn't make much sense for codepoints in the range U+0080..U+00FF,
> so
> > the <= UCHAR_MAX check should be <=0x7f. (Note that invalid byte
> sequences
> > that do not form valid characters do not hit this code path)
>
> No, it's perfectly ok to have range expressions with endpoints in that
> range, if uncommon.
>


Sure, my point was that, in a UTF-8 locale, a range expression like
[$'\x80'-$'\xFF'] would not be processed by this code. It's flagged as an
invalid multi-byte string in xstrmatch and the single-byte matching
functions are (correctly) used.

The only time the charcmp_wc args wc{1,2} are >= 0x80, <=UCHAR_MAX, they
represent codepoints in the U+0080..U+00FF range -- so they should not be
treated any differently than codepoints >UCHAR_MAX.

> * Remove the <= UCHAR_MAX checks (which would make the behavior match

> the documentation)

This is the right fix.


I agree.

Re: Bash glob range [0-5] in UTF-8 locale misses ¹, ² & ³

Reply via email to