Re: RFE: Please allow unicode ID chars in identifiers

George Tue, 13 Jun 2017 18:28:44 -0700

On Tue, 2017-06-13 at 20:14 -0400, Chet Ramey wrote:
> On 6/13/17 5:19 PM, tetsu...@scope-eye.net wrote:
> > 
> > 
> > In that case, the answer is simple:
> > 
> > The shell swiftly rejects the script, and provides a clear reason why
> > it cannot be run. ("bash: Script requires the en_US.utf8 locale which
> > is not installed on this system. Sorry, dude.")
> The shell has no business doing this. If a script requires a certain
> locale, and won't run correctly without it, the author can ensure that
> an assignment to LC_CTYPE produces the desired results.
> 
I already addressed this. Changing LC_CTYPE doesn't just impact how the shell 
interprets the script, it also changes how various other I/O operations
occur, how filenames are processed, and (presumably, assuming locale is 
exported) the setting is inherited by commands run by the script as well.
If my system's locale were based on GB18030, and I run a shell script that's 
encoded in UTF-8, and the author of that script had the bright idea to
set LC_CTYPE to en_US.utf8 to make the shell work in any locale - then I 
haven't succeeded in "running a script in a different locale than the one it
was written in", because once LC_CTYPE has been reset I am no longer IN my 
system's locale for the duration of that script.
The script doesn't necessarily require to run IN a particular locale, it needs 
to be INTERPRETED according to a certain locale, because locale
settings influence how the parser works.
> > 
> > This is also why I think this should be an optional "encoding marker"
> > at a fairly fixed location in the file, rather than an option setting
> > that could occur anywhere in the script: It allows an incompatible
> > script to be immediately identified and rejected before it does
> > anything.
> >


> 
> 
> This is relatively trivial to do with a shell function.
> 
> 

Sure, I just don't think that's the right answer.
If this method of supporting cross-locale scripts were adopted (and honestly, 
that possibility seems pretty remote, but I'm enjoying the discussion
anyway), this is a check we'd want in place for pretty much every script that 
uses the feature. Every time the script is run, we'd want to know first,
"can the shell run this script?" - there's no point repeating that bit of 
boilerplate in every single script. And there's nothing out there that can
better answer the question "Can the shell run this?" than the shell. And if the 
answer is "no" then it doesn't make sense for the shell to do anything
BUT error out of processing the script. And we should get that answer without 
*running* the script.

Re: RFE: Please allow unicode ID chars in identifiers

Reply via email to