Forking from discussion in “A path forward on rationalizing unicode identifiers
and operators”, it was suggested to put this in a new thread.
Background:
Swift partitions the character set into operators and identifiers to aid in
efficient parsing. This has the unfortunate side effect that the language spec
shoulders the burden of how to classify the thousands of unicode characters,
and it must do so universally across all users and contexts.
There are many characters with ambiguous usage, such as denoting the transpose
of matrix A as Aᵀ. The notation is specifically using a superscript T, but
this is also fundamentally a latin letter and the unicode code point is found
in the phonetic extensions block, not the math symbols block. In general, many
symbols could refer either to an action (operator), or the result of that
action (identifier), or have disparate domain-specific meanings. Should the
language spec really be in the business of deciding the ‘right’ use of each
character like this?
I also assert a lot of the bad reputation of custom operators comes from
languages which have limited operator character sets, which forces developers
to overload standard operators with surprising effects, instead of choosing a
symbol which is both unique and better recognized for the task at hand.
Allowing developers to choose apt operator symbols is akin to encouraging
descriptive identifiers. Writing good code is all about making these choices
appropriately, and that requires context, which only the end developer has.
To be clear, this will most likely be relegated to niche applications serving
domain experts. As established below the default behavior is to opt-out of
exotic operator choices. But given a user who wishes to do so, better to give
them the right tools for the purpose.
Goals:
1. Performance: file-local operator decisions (don’t require loading all the
imports first)
2. Maintenance: improve operator auditing/discoverability
3. Functionality: let users write what they want without lobbying this list
4. Well defined: aid in resolving conflicts between modules
Pitch:
Enable users to ‘import' specific operator symbols on a per-file basis,
updating the operator set used for parsing that file.
In the simplest form this would look like:
import operator ᵀ
This is only needed for “non-standard” operators. But by providing this escape
hatch, we can be conservative about choosing “standard” operators to a smaller,
well known set and avoid a lot of debate without sacrificing expressibility.
When this import is encountered, then any matching operator declarations are
made available simply because the character is interpreted as such. (i.e. all
modules’ operators are loaded as normal, but the compiler can only make the
connection in files that opt-in to interpreting that character as an operator.)
Conversely, conflicting module identifiers become inaccessible following such
an import, and hopefully good API would supply less exotic alternative
interfaces for both cases. Worst case the user could write an extension in a
new file with the complementary character choice and remap offending
operator/identifiers as they see fit.
Regarding operator declarations, one could suggest that the declaration itself
could update the operator character set for that file. However I suggest
always requiring the import operator statement (for non-standard operators)
partly to surface guidance when a choice of operator will require explicit
imports from other files. This also reduces potential for obfuscation by
operators with visually similar representation, as the import list would draw
attention to this chicanery.
Advanced Pitch:
The previous provides the “minimum viable product”, but we might like to take
this a little further and make it module-specific:
import matrixlib (operators: [ᵀ,·,⊗])
Again, only “non-standard” operators need to be listed, the “standard”
operators would import the same as today. But now as readers we can see where
special operators are coming from, and potentially filter competing
declarations from different modules. I also like that an operator family can
be listed on a single line rather than potentially a dozen lines covering
various combinations. A module vendor can concisely document its operator list
and make it easy to maintain and discover.
This syntax mimics a module “init” call, which could be a powerful concept for
future extensions. For example, we could introduce “standardOperators: false”
to disable the automatic import of standard operators overloads—which some
users might appreciate regardless of character set issues. (e.g. users could
select between conflicting standard operators in different modules, or just
peace of mind there’s no surprises.)
I anticipate this form would take a bit more work to implement, as Swift would
need to filter of the visibility of operators per module based on the
declarations in the current file. However, these two versions can work
together. The first form provides a global import across modules, and the
module-specific form can be added later.
What do people think?
Thanks,
-Ethan
_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution