The core team recently met to discuss PR609 - Refining identifier and operator 
symbology:
https://github.com/xwu/swift-evolution/blob/7c2c4df63b1d92a1677461f41bc638f31926c9c3/proposals/NNNN-refining-identifier-and-operator-symbology.md

The proposal correctly observes that the partitioning of unicode codepoints 
into identifiers and operators is a mess in some cases.  It really is an 
outright bug for 🙂 to be an identifier, but ☹️ to be an operator.  That said, 
the proposal itself is complicated and is defined in terms of a bunch of 
unicode classes that may evolve in the “wrong way for Swift” in the future.

The core team would really like to get this sorted out for Swift 5, and sooner 
is better than later :-).  Because it seems that this is a really hard problem 
and that perfection is becoming the enemy of good 
<https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good>, the core team 
requests the creation of a new proposal with a different approach.  The general 
observation is that there are three kinds of characters: things that are 
obviously identifiers, things that are obviously math operators, and things 
that are non-obvious.  Things that are non-obvious can be made into invalid 
code points, and legislated later in follow-up proposals if/when someone cares 
to argue for them.


To make progress on this, we suggest a few separable steps:

First, please split out the changes to the ASCII characters (e.g. . and \ 
operator parsing rules) to its own (small) proposal, since it is unrelated to 
the unicode changes, and can make progress on that proposal independently.


Second, someone should take a look at the concrete set of unicode identifiers 
that are accepted by Swift 4 and write a new proposal that splits them into the 
three groups: those that are clearly identifiers (which become identifiers), 
those that are clearly operators (which become operators), and those that are 
unclear or don’t matter (these become invalid code points).

I suggest that the criteria be based on utility for Swift code, not on the 
underlying unicode classification.  For example, the discussion thread for 
PR609 mentions that the T character in “  xᵀ  ” is defined in unicode as a 
latin “letter”.  Despite that, its use is Swift would clearly be as a postfix 
operator, so we should classify it as an operator.

Other suggestions:
 - Math symbols are operators excepting those primarily used as identifiers 
like “alpha”.  If there are any characters that are used for both, this 
proposal should make them invalid.
 - While there may be useful ranges for some identifiers (e.g. to handle 
european accented characters), the Emoji range should probably have each 
codepoint independently judged, and currently unassigned codepoints should not 
get a meaning defined for them.
 - Unicode “faces”, “people”, “animals” etc are all identifiers.
 - In order to reduce the scope of the proposal, it is a safe default to 
exclude characters that are unlikely to be used by Swift code today, including 
Braille, weird currency symbols, or any set of characters that are so broken 
and useless in Swift 4 that it isn’t worth worrying about.
 - The proposal is likely to turn a large number of code points into rejected 
characters.  In the discussions, some people will be tempted to argue endlessly 
about individual rejections.  To control that, we can require that people point 
out an example where the character is already in use, or where it has a clear 
application to a domain that is known today: the discussion needs to be 
grounded and practical, not theoretical.


Third, if there is interest sometime in the future, we can have subsequent 
proposals that expand the range of accepted code points, motivated by the 
specific application domain that cares about them.  These proposals will not be 
source breaking, so they can happen at any time.


Is anyone interested in helping to push this effort forward?

-Chris

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to