On Sat, Sep 13, 2025, at 11:02 AM, Greg Wooledge wrote:
> I really think this is a bad idea. A script needs to have predictable
> behavior regardless of what bizarre locales may exist on the target
> system.
Turns out that this doesn't even require a particularly "bizarre"
locale to observe. ISO/IEC 8859-1 encodes NBSP as A0, so on macOS:
$ export LC_ALL=en_US.ISO8859-1
$ [[ $'\xA0' = [[:blank:]] ]]; echo "$?"
0
$ eval set a$'\xA0'z; echo "$# args"
2 args
Behaviors vary somewhat among shells. Some don't recognize A0 as
a <blank>, zsh recognizes it as a <blank> but doesn't delimit tokens
with it, and yash agrees with bash. (Various compatibility modes
don't make a difference here.)
$ cat /tmp/nbsp_test.sh
nbsp=$(printf '\240')
case $nbsp in
[[:blank:]])
printf 'blank, '
;;
*)
printf 'not blank, '
;;
esac
eval set "a${nbsp}z"
case $# in
1)
echo not delimiting
;;
2)
echo delimiting
;;
esac
$ export LC_ALL=en_US.ISO8859-1
$ /bin/bash /tmp/nbsp_test.sh # bash 3.2.57
blank, delimiting
$ ~/build/bash/bash "$_" # bash devel
blank, delimiting
$ dash "$_" # dash 0.5.12
not blank, not delimiting
$ /bin/ksh "$_" # ksh93u+ 2012-08-01
not blank, not delimiting
$ ksh "$_" # ksh93u+m/1.0.10 2024-08-01
not blank, not delimiting
$ mksh "$_" # mksh R59 2020/10/31
not blank, not delimiting
$ oksh "$_" # OpenBSD 7.7 ksh
not blank, not delimiting
$ yash "$_" # yash 2.58.1
blank, delimiting
$ zsh "$_" # zsh 5.9
blank, not delimiting
POSIX seems to require delimiting on all <blank>s [*], without
qualification.
7. If the current character is an unquoted <blank>, any
token containing the previous character is delimited
and the current character shall be discarded.
yash takes this very seriously.
$ export LC_ALL=en_US.UTF-8
$ [[ $'\uA0' = [[:blank:]] ]]; echo "$?"
0
$ bash -c 'set a'$'\uA0''z; echo "$# args"'
1 args
$ yash -c 'set a'$'\uA0''z; echo "$# args"'
2 args
[*]
https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/utilities/V3_chap02.html#tag_19_03
--
vq