Re: [swift-evolution] A path forward on rationalizing unicode identifiers and operators

Vladimir.S via swift-evolution Mon, 02 Oct 2017 05:00:04 -0700

On 02.10.2017 8:30, Kenny Leung via swift-evolution wrote:

I guess theoretically you could have two variables that look alike, but are actuallydifferent values, allowing you to insert some obfuscated malicious code somehow.

Also, IIRC, there is a "similar" problem exists with Right-To-Left "modifier", sowhen inserted inside some variable name, you *see* (in browser/in editor) not thesame variable name that will be used *by compiler*. Can't find the link right now,but if this could be helpful - will try to find.


Vladimir.

-Kenny

On Oct 1, 2017, at 10:01 PM, Chris Lattner <[email protected]<mailto:[email protected]>> wrote:

On Oct 1, 2017, at 9:26 PM, Kenny Leung via swift-evolution<[email protected] <mailto:[email protected]>> wrote:
Hi All.

I’d like to help as well. I have fun with operators.
There is also the issue of code security with invisible unicode characters andcharacters that look exactly alike.

Unless there is a compelling reason to add them, I think we should ban invisiblecharacters. What is the harm of characters that look alike?


-Chris

(They should make a Coding font that ensures all characters look different.) Wasthat ever resolved? Googling, I found this:


https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160620/021446.html

Which seems to have been left at this:

https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160725/025555.html

https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160919/thread.html#27229

Should we throw all of this into the same pot, and make any characters that aren’ton the approved list illegal?


-Kenny

On Sep 30, 2017, at 4:13 PM, Xiaodi Wu via swift-evolution<[email protected] <mailto:[email protected]>> wrote:

I’m happy to participate in the reshaping of the proposal. It would be nice togather a group of people again to help drive it forward.

That said, it’s unclear to me that superscript T is clearly an operator, any morethan would be superscript H (Hermitian), superscript 2, superscript 3, etc. Butat any rate, this would be discussion for the future workgroup.

I would strongly advocate that the things-that-are-identifiers group be stronglytied to the existing, complete Unicode standard for such, and that the criticalparts of the previous document about normalization be retained.

On Sat, Sep 30, 2017 at 17:59 Chris Lattner via swift-evolution<[email protected] <mailto:[email protected]>> wrote:



    The core team recently met to discuss PR609 - Refining identifier and
    operator symbology:
    
https://github.com/xwu/swift-evolution/blob/7c2c4df63b1d92a1677461f41bc638f31926c9c3/proposals/NNNN-refining-identifier-and-operator-symbology.md

    The proposal correctly observes that the partitioning of unicode codepoints
    into identifiers and operators is a mess in some cases.  It really is an
    outright bug for 🙂 to be an identifier, but ☹️ to be an operator.  That
    said, the proposal itself is complicated and is defined in terms of a bunch
    of unicode classes that may evolve in the “wrong way for Swift” in the 
future.

    The core team would really like to get this sorted out for Swift 5, and
    sooner is better than later :-).  Because it seems that this is a really 
hard
    problem and that perfection is becoming the enemy of good
    <https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good>, the core team
    requests the creation of a new proposal with a different approach.  The
    general observation is that there are three kinds of characters: things that
    are obviously identifiers, things that are obviously math operators, and
    things that are non-obvious.  Things that are non-obvious can be made into
    invalid code points, and legislated later in follow-up proposals if/when
    someone cares to argue for them.


    To make progress on this, we suggest a few separable steps:

    First, please split out the changes to the ASCII characters (e.g. . and \
    operator parsing rules) to its own (small) proposal, since it is unrelated 
to
    the unicode changes, and can make progress on that proposal independently.


    Second, someone should take a look at the concrete set of unicode 
identifiers
    that are accepted by Swift 4 and write a new proposal that splits them into
    the three groups: those that are clearly identifiers (which become
    identifiers), those that are clearly operators (which become operators), and
    those that are unclear or don’t matter (these become invalid code points).

    I suggest that the criteria be based on*utility for Swift code*, not on the
    underlying unicode classification.  For example, the discussion thread for
    PR609 mentions that the T character in “  xᵀ  ” is defined in unicode as a
    latin “letter”.  Despite that, its use is Swift would clearly be as a 
postfix
    operator, so we should classify it as an operator.

    Other suggestions:
     - Math symbols are operators excepting those primarily used as identifiers
    like “alpha”.  If there are any characters that are used for both, this
    proposal should make them invalid.
     - While there may be useful ranges for some identifiers (e.g. to handle
    european accented characters), the Emoji range should probably have each
    codepoint independently judged, and currently unassigned codepoints should
    not get a meaning defined for them.
     - Unicode “faces”, “people”, “animals” etc are all identifiers.
     - In order to reduce the scope of the proposal, it is a safe default to
    exclude characters that are unlikely to be used by Swift code today,
    including Braille, weird currency symbols, or any set of characters that are
    so broken and useless in Swift 4 that it isn’t worth worrying about.
     - The proposal is likely to turn a large number of code points into 
rejected
    characters.  In the discussions, some people will be tempted to argue
    endlessly about individual rejections.  To control that, we can require that
    people point out an example where the character is already in use, or where
    it has a clear application to a domain that is known today: the discussion
    needs to be grounded and practical, not theoretical.


    Third, if there is interest sometime in the future, we can have subsequent
    proposals that expand the range of accepted code points, motivated by the
    specific application domain that cares about them.  These proposals will not
    be source breaking, so they can happen at any time.


    Is anyone interested in helping to push this effort forward?

    -Chris

    _______________________________________________
    swift-evolution mailing list
    [email protected] <mailto:[email protected]>
    https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
[email protected] <mailto:[email protected]>
https://lists.swift.org/mailman/listinfo/swift-evolution


_______________________________________________
swift-evolution mailing list
[email protected] <mailto:[email protected]>
https://lists.swift.org/mailman/listinfo/swift-evolution




_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] A path forward on rationalizing unicode identifiers and operators

Reply via email to