In that case, the answer is simple: The shell swiftly rejects the script, and provides a clear reason why it cannot be run. ("bash: Script requires the en_US.utf8 locale which is not installed on this system. Sorry, dude.")
This, in my opinion, is certainly preferable over the current situation, in which the script runs, and - MAYBE fails at an UNKNOWN time with an unhelpful message like "bash: $'351211204344272272': command not found" - MAYBE fails in a more subtle, unforeseen way (i.e. word-splitting the middle of a command name or identifier, then running the wrong command with a garbage argument) This is also why I think this should be an optional "encoding marker" at a fairly fixed location in the file, rather than an option setting that could occur anywhere in the script: It allows an incompatible script to be immediately identified and rejected before it does anything. ----- Original Message ----- From: "Greg Wooledge" <wool...@eeg.ccf.org> To:<tetsu...@scope-eye.net> Cc:"bug-bash" <bug-bash@gnu.org> Sent:Tue, 13 Jun 2017 17:00:10 -0400 Subject:Re: RFE: Please allow unicode ID chars in identifiers On Tue, Jun 13, 2017 at 04:44:08PM -0400, tetsu...@scope-eye.net wrote: > For that to work, basically the character encoding used to interpret > the script should be (potentially) distinct from the one used to > interact with the rest of the system. > > ...But that gets complicated: the shell would need to interpret the > script in its locale of origin, but still respect the locale for other > matters of I/O. [...] The main issue here is that the author's locale may not *exist* on the user's machine. There may not be any way for bash to determine which non-ASCII characters constitute "letters" in the author's locale.