On Mon, May 1, 2023 at 11:48 AM Chet Ramey <chet.ra...@case.edu> wrote: > > (And once we get these issues straightened out, if you look back to your > original example, 0x240 is a blank in my locale, en_US.UTF-8, and will be > removed from the input stream by the parser unless it's quoted.)
On at least recent macos versions, it seems that the ctype.h functions treat [0x80..0xFF] the same as wctype.h functions would. So while U+00A0 is a space character in the en_US.UTF-8 locale, and iswspace(L'\u00A0') returns 1, it is also the case that isspace(0xA0) returns 1. But I don't think it's correct to actually rely on the latter since the single byte 0xA0 doesn't represent _any_ character in the locale, much less a space. (I think that's the reason for the behavior Chet noted above from a previous thread). For example, these outputs would be correct with \uA0 in place of \xA0 below, but I don't think the current behaviour is expected: $ eval $'printf "<%s>" [\xA0\xA0]' <[><]> [[ $'\xA0' == [[:space:]] ]]; echo $? 0 Perhaps on platforms like this it would be appropriate to mask ctype results with something equivalent to `btowc(c) != WEOF'? (See http://www.openradar.me/FB9973780 for an example of the issue in an apple-supplied program)