On 6/13/17 4:44 PM, tetsu...@scope-eye.net wrote: > > > Please excuse the top-posting, this mail client isn't very good... > > To some extent, tying the shell script language to the locale is > unavoidable. However, one of the points I was trying to make is that, in > principle, at least, this shouldn't be the case. If a script is written in > a particular character encoding (and uses characters from that encoding in > its function names or parameter names, for instance) it should still run > correctly even if it's run in a different locale, just as a compiled > program should be able to run in a locale other than the one in which its > source code was authored.
This isn't a good comparison. Even a compiled program that calls one of the ctype.h functions is dependent on the locale in which it's run. A script, since it's text and interpreted, has the same dependency, to an even greater extent. If C source code contains character strings that are encoded in the author's locale, you're going to get indeterminate results if you try to display them in an environment using a different locale. You can mitigate this somewhat by using the mechanisms available to control the locale: for a C program it's setlocale(), and for a script it's the LC_ and LANG variables. > For that to work, basically the character encoding used to interpret the > script should be (potentially) distinct from the one used to interact with > the rest of the system. What "rest of the system"? What "matters of I/O"? > > ...But that gets complicated: the shell would need to interpret the script > in its locale of origin, but still respect the locale for other matters of > I/O. But since data in the shell intermingles with programming constructs > in the shell (Variables get passed by name, command and function names get > stored in (and invoked from) shell variables, variable values and "here" > docs come from the script, etc.) it gets into questions like, do we have to > track character encoding for each variable in the script? When do we > transcode between encodings? And what happens when a transcoding isn't > possible? > > So maybe the whole thing is just reaching too far... But that's how I'd > want to approach it: I'd want people to be able to use their character set > in their scripts, but I'd want it to work in a way that a script, once > written, can work regardless of the active locale. I assume that by this you mean the user's locale. You can still force a different one. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU c...@case.edu http://cnswww.cns.cwru.edu/~chet/