On 9/16/25 10:58 PM, Martin D Kealey wrote:
On Tue, 16 Sept 2025 at 01:46, Chet Ramey <[email protected]
<mailto:[email protected]>> wrote:
On 9/15/25 10:46 AM, Robert Elz wrote:
> If it was intended to mean "parsing the script" it would certainly
say so.
Doesn't the fact that the discussion of token recognition includes the
references to <blank>s imply this?
There's an argument to utility that strongly suggests that this is not
what's intended.
If you don't think it's intended, file an interpretation request with the
austin group and see what the group says.
Having encoding be specified as a sub-aspect of a locale seemed like a
reasonable design choice in the 1970s,(*1)
Locales weren't a thing in the 1970s.
Perhaps it's worth considering the <alpha> category, which in some national
ASCII variants(*4) would include codepoints 0x5b…0x5e & 0x7b…0x7e. It
would make no sense(*6) to treat the US-ASCII symbols @ [ \ ] ^ ` { | } ~
as “part of an identifier”(*7) while parsing a script, and LC_CTYPE(*8)
should /clearly/ be ignored in this case.
POSIX variables are names. Names are
"In the shell command language, a word consisting solely of underscores,
digits, and alphabetics from the portable character set. The first
character of a name is not a digit."
LC_CTYPE is not an issue.
So I would argue that for
consistency it should be ignored entirely while parsing.
Given the above definition, which is not the same language used to describe
the effect of <blank> characters on token recognition, there's no
consistency argument.
I support Greg Wooledge's request: if you feel obliged to implement locale-
aware parsing, please gate it on being in Posix mode, and/or a new default-
off option.
`Locale-aware' parsing (in the sense that <blanks>s are defined by the
LC_CTYPE category in the current locale) has been in bash for as long as
it's been in POSIX (2003, bash-3.0). I think there are two issues: the
obvious backward compatibility concern, and the undesirability of shells
named sh and bash behaving differently here.
Additionally, please can this be highlighted in the manual as
“experimental” and “may be withdrawn in a future version of Bash” or
similar, until such time as there's an official interpretation issued by
the Posix WG.
There is an official interpretation -- it's the language in the standard.
If you want another, you'll have to file an interpretation request.
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU [email protected] http://tiswww.cwru.edu/~chet/