On Mon, Jun 05, 2017 at 04:52:19AM -0400, George wrote:
[...]
> To hazard a guess: Each call to legal_identifier() and assignment() in
> the patched code requires copying the parameter and translating it to
> a wide-character string (with no provision for skipping the added work
> as a build option). It appears the memory allocated for these copies
> leaks (I didn't see any added calls to xfree() to go with those new
> xmallocs()), and the character type for the character conversion is
> derived from the user's locale (which means there's not a reliable
> mechanism in place to run a script in a locale whose character
> encoding doesn't match that of the script.) And he did mention "issues
> with compound assignments" as well. Those issues would need to be
> resolved.
Correct. There's also mixed use of wide-character strings and normal
strings, because that was easier to hack quickly.
By the way, ksh93 and zsh already support Unicode identifiers:
dualbus@debian:~$ for sh in bash mksh ksh93 zsh; do LC_CTYPE=en_US.utf8 $sh
-c 'φ=phi; echo $φ'; done
bash: φ=phi: command not found
$φ
mksh: φ=phi: not found
$φ
phi
phi
And all of these four support Unicode function names:
dualbus@debian:~$ for sh in bash mksh ksh93 zsh; do LC_CTYPE=en_US.utf8
$sh -c 'φ() { echo hi; }; φ'; done
hi
hi
hi
hi
--
Eduardo Bustamante
https://dualbus.me/