On 6/2/17 12:52 AM, dualbus wrote:
> - There are some questions that must be answered first:
>
> * How do you how to decode multibyte character sequences into Unicode?
> Should UTF-8 be assumed?
It has to be the current locale.
> * Will the parsing of a script depend upon the user locale?
Only in the sense that identifiers will depend on the current locale.
> * Should this special parsing code be disabled if POSIX mode is
> enabled?
Yes. Posix requires that variables be names, as defined below. However,
it should be possible to enable it while in Posix mode as an extension.
> * Right now `name' or `identifier' is defined as:
>
> name: A word consisting only of alphanumeric characters and
> underscores, and beginning with an alphabetic character or an
> underscore. Also referred to as an identifier.
>
> How will the definition look like with Unicode identifiers?
Add 'from the current locale's character set'.
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU [email protected] http://cnswww.cns.cwru.edu/~chet/