Please excuse the top-posting, this mail client isn't very good...
To some extent, tying the shell script language to the locale is unavoidable. However, one of the points I was trying to make is that, in principle, at least, this shouldn't be the case. If a script is written in a particular character encoding (and uses characters from that encoding in its function names or parameter names, for instance) it should still run correctly even if it's run in a different locale, just as a compiled program should be able to run in a locale other than the one in which its source code was authored. For that to work, basically the character encoding used to interpret the script should be (potentially) distinct from the one used to interact with the rest of the system. ...But that gets complicated: the shell would need to interpret the script in its locale of origin, but still respect the locale for other matters of I/O. But since data in the shell intermingles with programming constructs in the shell (Variables get passed by name, command and function names get stored in (and invoked from) shell variables, variable values and "here" docs come from the script, etc.) it gets into questions like, do we have to track character encoding for each variable in the script? When do we transcode between encodings? And what happens when a transcoding isn't possible? So maybe the whole thing is just reaching too far... But that's how I'd want to approach it: I'd want people to be able to use their character set in their scripts, but I'd want it to work in a way that a script, once written, can work regardless of the active locale. ----- Original Message ----- From: chet.ra...@case.edu To:<tetsu...@scope-eye.net>, "dualbus" <dual...@gmail.com>, "L A Walsh" <b...@tlinx.org> Cc:<chet.ra...@case.edu>, "bug-bash" <bug-bash@gnu.org> Sent:Tue, 13 Jun 2017 15:04:24 -0400 Subject:Re: RFE: Please allow unicode ID chars in identifiers On 6/2/17 12:54 PM, tetsu...@scope-eye.net wrote: > - As you pointed out, this requires the shell to somehow establish a > convention governing the character set used to interpret shell scripts It's actually the same one that is currently used: the current locale. > > But, on the other hand: > - Even if your editor or terminal can't display the UTF-8 code, that > doesn't mean the shell process can't RUN it. As long as the locale is set appropriately. > 2: For a script, the character encoding of commands must be explicitly > specified, probably via a shell option. You can already do this by setting the various locale environment variables.