On 9/8/25 2:24 AM, Grisha Levit wrote:
On Sun, Sep 7, 2025 at 2:46 AM Duncan Roe wrote:
`ls -1 [0-5]*` should produce the same output as `ls -1` but instead:-
[...]
superscripts ¹, ² & ³ are missing.

My take at an explanation: '₀' - '₉' are Unicode U+2080-9. These display fine.
'⁰' is U+2070 & '⁹' is U+2079, but '¹' is U+00B9, '²' is U+00B2 & '³' is U+00B3.

This appears to be a bug with the globasciiranges option.

The documentation suggests that enabling this option will disable locale-
aware collation in range expressions:

Yes, that's the idea. The range depends on codepoints rather than locale-
specific collating sequences. See

https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html

for "rational range interpretation."


       globasciiranges
           If set, range expressions used in pattern matching  bracket
           expressions  (see  Pattern  Matching above) behave as if in
           the traditional C locale when performing comparisons.  That
           is, pattern matching does not  take  the  current  locale’s
           collating sequence  into  account,  so  b  will not collate
           between  A  and  B,  and  upper‐case  and  lower‐case ASCII
           characters will collate together.

But the implementing code [1] for multibyte locales does the following:

    385  charcmp_wc (wint_t c1, wint_t c2, int forcecoll)
    ...
    393    if (forcecoll == 0 && glob_asciirange && c1 <= UCHAR_MAX && c2 <= 
UCHAR_MAX)
    394      return ((int)(c1 - c2));
    ...
    399    return (wcscoll (s1, s2));

So, in fact, locale-aware collation is disabled only if the range start
and end codepoints are both in the range U+0001..U+00FF.  This doesn't
make much sense for codepoints in the range U+0080..U+00FF.

Maybe not common, but it's perfectly valid.


We should either:

   * Remove the <= UCHAR_MAX checks (which would make the behavior match
     the documentation)
This is the right fix.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    [email protected]    http://tiswww.cwru.edu/~chet/

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to