Re: Patch for unicode in varnames...

Chet Ramey Tue, 13 Jun 2017 12:46:50 -0700

On 6/5/17 8:40 PM, Peter & Kelly Passchier wrote:
> On 06/06/2560 05:39, George wrote:
>> So if you had "Pokémon" as an identifier in a Latin-1-encoded script (byte 
>> value 0xE9 between the "k" and "m") and then tried running that script in a
>> UTF-8 locale, that byte sequence (0xE9 0x6D) would actually be invalid in 
>> UTF-8, so Eduardo's patch would indicate that the identifier is invalid and
>> fail to run the script.
> 
> I often work with a locale that has a UTF-8 encoding and an
> different/older encoding that are incompatible. I haven't tried the
> patch, but if I use unicode characters in function names, if I write a
> script in one encoding, and run it in an environment in the other
> encoding, it still runs correctly, but it won't render correctly.


This can lead to subtle failures. If a variable name uses a character that
is an alphanumeric in the writer's locale, but not the default locale
where it's executed, the writer has to set the locale explicitly to avoid
the variable causing a `not a valid identifier' error.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    [email protected]    http://cnswww.cns.cwru.edu/~chet/

Re: Patch for unicode in varnames...

Reply via email to