On Mon, Jun 05, 2017 at 04:52:19AM -0400, George wrote: [...] > To hazard a guess: Each call to legal_identifier() and assignment() in > the patched code requires copying the parameter and translating it to > a wide-character string (with no provision for skipping the added work > as a build option). It appears the memory allocated for these copies > leaks (I didn't see any added calls to xfree() to go with those new > xmallocs()), and the character type for the character conversion is > derived from the user's locale (which means there's not a reliable > mechanism in place to run a script in a locale whose character > encoding doesn't match that of the script.) And he did mention "issues > with compound assignments" as well. Those issues would need to be > resolved.
Correct. There's also mixed use of wide-character strings and normal strings, because that was easier to hack quickly. By the way, ksh93 and zsh already support Unicode identifiers: dualbus@debian:~$ for sh in bash mksh ksh93 zsh; do LC_CTYPE=en_US.utf8 $sh -c 'φ=phi; echo $φ'; done bash: φ=phi: command not found $φ mksh: φ=phi: not found $φ phi phi And all of these four support Unicode function names: dualbus@debian:~$ for sh in bash mksh ksh93 zsh; do LC_CTYPE=en_US.utf8 $sh -c 'φ() { echo hi; }; φ'; done hi hi hi hi -- Eduardo Bustamante https://dualbus.me/