Re: bash 5.2.21-1: a bug in [0-9] expansion

Sam Edge via Cygwin Mon, 01 Sep 2025 14:24:57 -0700

On 01/09/2025 18:19, Brian Inglis via Cygwin wrote:
> On 2025-08-31 13:06, Mariusz Wodzicki via Cygwin wrote:
>> Description of the problem.

>> [0-9] picks also certain Unicode superscript characters ( namely, ⁰⁴ ⁵ ⁶

>> ⁷ ⁸ ⁹ ), and every Unicode subscript character.
>>
>> Example: the directory has the following files:
>> $ /bin/ls
>> ₀.txt  ₁.txt  ₂.txt  ₃.txt  ₄.txt  ₅.txt  ₆.txt  ₇.txt ₈.txt  ₉.txt
>> ⁰.txt  ¹.txt  ².txt  ³.txt  ⁴.txt  ⁵.txt  ⁶.txt  ⁷.txt ⁸.txt  ⁹.txt
>>
>> $ /bin/ls [0-9].txt
>> ₀.txt  ₁.txt  ₃.txt  ⁴.txt  ⁵.txt  ⁶.txt  ⁷.txt  ⁸.txt
>> ⁰.txt  ₂.txt  ₄.txt  ₅.txt  ₆.txt  ₇.txt  ₈.txt
>>
>> $ locale
>> LANG=en_US.UTF-8
>> LC_CTYPE="en_US.UTF-8"
>> LC_NUMERIC="en_US.UTF-8"
>> LC_TIME="en_US.UTF-8"
>> LC_COLLATE="en_US.UTF-8"
>> LC_MONETARY="en_US.UTF-8"
>> LC_MESSAGES="en_US.UTF-8"
>> LC_ALL=
>>
>> System.
>> Fully up to date Windows 11
>> cygwin 3.6.4-1
>> bash    5.2.21-1
>

> For reproducible results prefix commands with LC_ALL=C … or possiblyjust LC_COLLATE=C or LC_CTYPE=C or =POSIX to standardize the locale,otherwise many commands will respect the current locale, and somerespect Unicode regardless of locale e.g. `info wc`:

> "Unless the environment variable ‘POSIXLY_CORRECT’ is set, GNU ‘wc’treats the following Unicode characters as white space even if thecurrent locale does not: U+00A0 NO-BREAK SPACE, U+2007 FIGURE SPACE,U+202F NARROW NO-BREAK SPACE, and U+2060 WORD JOINER."

> For GNU utilities, where info pages are preferred, such ascoreutils*, compiler and language processors, and tools packages, manydetails do not appear in the man pages, for example:

> "Full documentation <https://www.gnu.org/software/coreutils/wc> oravailable locally via: info '(coreutils) wc invocation'"

>
> although `info wc` shows the same page.
>
> —————

> * [ arch b2sum base32 base64 basename cat chcon chgrp chmod chownchroot cksum comm cp csplit cut date dd df dir dircolors dirname du echoenv expand expr factor false fmt fold gkill groups head hostid idinstall join link ln logname ls md5sum mkdir mkfifo mknod mktemp mv nicenl nohup nproc numfmt od paste pathchk pinky pr printenv printf ptx pwdreadlink realpath rm rmdir runcon seq sha1sum sha224sum sha256sumsha384sum sha512sum shred shuf sleep sort split stat stdbuf stty sumsync tac tail tee test timeout touch tr true truncate tsort tty unameunexpand uniq unlink users vdir wc who whoami yes

Bash is GNU but isn't part of coreutils as far as I know. Type 'manbash' and then read the 'Pattern Matching' section for its globbingbehaviour.

TL;DR For bash 5.2, using 'export LC_ALL=C.UTF-8' as Brian suggests or'export LC_COLLATE=C.UTF-8' or 'shopt -s globasciiranges' should revertto simple ASCII ranges for '[0-9]', '[a-z]' etc.

I'm seeing the correct behaviour with up-to-date Cygwin bash/coreutilsetc. by the way. 'echo [0-9]*' only expands out sub/super-digits if Iuse 'LC_COLLATE=en_GB.UTF-8' or similar with 'shopt -u globasciiranges'.



--
Sam Edge


--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Re: bash 5.2.21-1: a bug in [0-9] expansion

Reply via email to