Re: RFE: Please allow unicode ID chars in identifiers

George Sat, 03 Jun 2017 11:01:03 -0700

On Sat, 2017-06-03 at 01:20 -0700, L A Walsh wrote:
> Some conventions regarding character set usage have already been "solved"
> and encoded in binary properties of the characters.  For example, the
> start and continue "ID" properties are best associated with names used
> for variables. 
Referring to "Unicode Identifier and Pattern Syntax", I guess:
http://unicode.org/reports/tr31
>  And while the symbol for pi (𝛑) may look similar, it would
> most likely be used where numbers (and numeric constants) are used, while
> the greek letter would be used in identifiers.
> 
>


On some fonts, the two may look not only similar, but identical. I brought it 
up mainly to acknowledge that there are some issues in Unicode handling
that make this issue potentially more complicated than "don't disallow it". In 
my opinion, to do it right the syntax should recognize Unicode
whitespace at least, which in turn requires an explicit decision that the 
source text is Unicode (and not generically-processed byte sequences outside
the ASCII range)
(The question of whether to support Unicode mathematical operators or Unicode 
quotation marks introduces a lot of interesting possibilities - but
that's a whole other ball of wax)
Problems like Unicode normalization, and questions of what constitutes 
equivalency in the scope of the programming language can complicate things.
There's a series of trade-offs between keeping the implementation relatively 
simple vs. supporting equivalency where the user may reasonably expect
it. There's potential for something like this to get complicated, depending on 
how far the design goes in supporting equivalency for semi-redundant
Unicode characters. I don't think a programming language necessarily needs to 
go too far down that rabbit hole, really, but I wanted to acknowledge
the issue.
> I'm glad some people are willing to discuss things rather than run around
> asserting that the sky will *cost* something ... Whether or not it cost
> something shouldn't prevent people from forming ideas that that they might
> find desirable in the future.  It certainly doesn't mean such features are
> expected next month, or even "anytime" by a specific person (if they have
> no interest in the work) I'd support them not doing it as long as they
> allow someone more interested to do the work.  Being open to doing it
> yourself isn't required to be open to seeing something grow in specific
> directions...
> 
> 

To be fair, that reaction is understandable. When someone like me shows up and 
starts trying to influence the direction of a project, even when I
contribute code to produce the changes I want to see in the project, will I be 
the one maintaining that code in 5 years? To be quite honest in my case
the answer is almost certainly "no". When I advocate for a feature to be added 
to Bash, I am probably not going to be paying the bulk of the price for
it. Willingness to contribute code is a very low bar.
But speaking vaguely about "the future" and open-ended expectations with 
respect to time is probably counterproductive. As it is, Bash bug reports and
feature requests are neglected for years on end, new features make their way to 
Bash years after they're pioneered in ksh. A vague "in the future"
could easily find us back here, ten years from now, still wondering when Bash 
will support Unicode in parameter names. Best we can do is make a good
case for the features we want to see integrated, and do as much as we can to at 
least reduce the _initial_ cost of the feature.

Re: RFE: Please allow unicode ID chars in identifiers

Reply via email to