Re: Next up for patterns: type patterns in switch

Remi Forax Thu, 23 Jul 2020 11:39:26 -0700

Right now: add only the type Pattern and the null Pattern so people can test 
and report if we go in the right direction or not.


Don't do deconstruction now ! We are not ready :)

In next releases? Add constant Pattern and static Pattern. Ban instance pattern 
because you don't want to match something that depends on a mutable states (my 
worst experience with Scala).

On guards, they do not work well with exhaustiveness. Still not sure they are 
useful.
On nullity, I prefer the total pattern to be explicit instead of implicit 
(given we have var, the type on the expression taken by the switch can be fuzzy 
for someone doing a quick look to the code), case any x, case _ x or whatever 
the syntax is, is more readable IMO.

And I agree on banning total patterns in instanceof.

Remi



On July 23, 2020 4:20:01 PM UTC, Brian Goetz <[email protected]> wrote:
>In case I wasn't clear, this was a proposal to proceed on _right now_. 
>
>There's been no comments so far; should I take that as "perfect, ship
>it?"
>
>
>On 6/24/2020 10:44 AM, Brian Goetz wrote:
>> There are a lot of directions we could take next for pattern 
>> matching.  The one that builds most on what we've already done, and 
>> offers significant incremental expressiveness, is extending the type 
>> patterns we already have to a new context: switch.  (There is still 
>> plenty of work to do on deconstruction patterns, pattern assignment, 
>> etc, but these require more design work.)
>>
>> Here's an overview of where I think we are here.
>>
>> [JEP 305][jep305] introduced the first phase of [pattern 
>> matching][patternmatch]
>> into the Java language.  It was deliberately limited, focusing on
>only 
>> one kind
>> of pattern (type test patterns) and one linguistic context
>(`instanceof`).
>> Having introduced the concept to Java developers, we can now extend 
>> both the
>> kinds of patterns and the linguistic context where patterns are used.
>>
>> ## Patterns in switch
>>
>> The obvious next context in which to introduce pattern matching is 
>> `switch`;  a
>> switch using patterns as `case` labels can replace `if .. else if` 
>> chains with
>> a more direct way of expressing a multi-way conditional.
>>
>> Unfortunately, `switch` is one of the most complex, irregular 
>> constructs we have
>> in Java, so we must teach it some new tricks while avoiding some 
>> existing traps.
>> Such tricks and traps may include:
>>
>> **Typing.**  Currently, the operand of a `switch` may only be one of
>the
>> integral primitive types, the box type of an integral primitive, 
>> `String`, or an
>> `enum` type.  (Further, if the `switch` operand is an `enum` type,
>the 
>> `case`
>> labels must be _unqualified_ enum constant names.)  Clearly we can 
>> relax this
>> restriction to allow other types, and constrain the case labels to
>only be
>> patterns that are applicable to that type, but it may leave a seam of
>
>> "legacy"
>> vs "pattern" switch, especially if we do not adopt bare constant 
>> literals as
>> the denotation of constant patterns.  (We have confronted this issue 
>> before with
>> expression switch, and concluded that it was better to rehabilitate 
>> the `switch`
>> we have rather than create a new construct, and we will make the same
>
>> choice
>> here, but the cost of this is often a visible seam.)
>>
>> **Parsing.**  The grammar currently specifies that the operand of a 
>> `case` label
>> is a `CaseConstant`, which casts a wide syntactic net, later narrowed
>with
>> post-checks after attribution.  This means that, since parsing is
>done 
>> before we
>> know the type of the operand, we must be watchful for ambiguities
>between
>> patterns and expressions (and possibly refine the production for 
>> `case` labels.)
>>
>> **Nullity.**  The `switch` construct is currently hostile to `null`, 
>> but some
>> patterns do match `null`, and it may be desirable if nulls can be
>handled
>> within a suitably crafted `switch`.
>>
>> **Exhaustiveness.**  For switches over the permitted subtypes of 
>> sealed types,
>> we will want to be able to do exhaustiveness analysis -- including
>for 
>> nested
>> patterns (i.e., if `Shape`  is `Circle` or `Rect`, then `Box(Circle 
>> c)` and
>> `Box(Rect r)` are exhaustive on `Box<Shape>`.)
>>
>> **Fallthrough.**  Fallthrough is everyone's least favorite feature of
>
>> `switch`,
>> but it exists for a reason.  (The mistake was making fallthrough the 
>> default
>> behavior, but that ship has sailed.)  In the absence of an OR pattern
>> combinator, one might find fallthrough in switch useful in
>conjunction 
>> with
>> patterns:
>>
>> ```
>> case Box(int x):
>> case Bag(int x):
>>     // use x
>> ```
>>
>> However, it is likely that we will, at least initially, disallow 
>> falling out
>> of, or into, a case label with binding variables.
>>
>> #### Translation
>>
>> Switches on primitives and their wrapper types are translated using
>the
>> `tableswitch` or `lookupswitch` bytecodes; switches on strings and 
>> enums are
>> lowered in the compiler to switches involving hash codes (for
>strings) or
>> ordinals (for enums.)
>>
>> For switches on patterns, we would need a new strategy, one likely 
>> built on
>> `invokedynamic`, where we lower the cases to a densely numbered `int`
>
>> switch,
>> and then invoke a classifier function with the operand which tells us
>
>> the first
>> case number it matches.  So a switch like:
>>
>> ```
>> switch (o) {
>>     case P: A
>>     case Q: B
>> }
>> ```
>>
>> is lowered to:
>>
>> ```
>> int target = indy[BSM=PatternSwitch, args=[P,Q]](o)
>> switch (target) {
>>     case 0: A
>>     case 1: B
>> }
>> ```
>>
>> A symbolic description of the patterns is provided as the bootstrap 
>> argument
>> list, which builds a decision tree based on analysis of the patterns 
>> and their
>> target types.
>>
>> #### Guards
>>
>> No matter how rich our patterns are, it is often the case that we
>will 
>> want
>> to provide additional filtering on the results of a pattern:
>>
>> ```
>> if (shape instanceof Cylinder c && c.color() == RED) { ... }
>> ```
>>
>> Because we use `instanceof` as part of a boolean expression, it is
>easy to
>> narrow the results by conjoining additional checks with `&&`.  But in
>
>> a `case`
>> label, we do not necessarily have this opportunity.  Worse, the 
>> semantics of
>> `switch` mean that once a `case` label is selected, there is no way
>to say
>> "oops, forget it, keep trying from the next label".
>>
>> It is common in languages with pattern matching to support some form 
>> of "guard"
>> expression, which is a boolean expression that conditions whether the
>case
>> matches, such as:
>>
>> ```
>> case Point(var x, var y)
>>     __where x == y: ...
>> ```
>>
>> Bindings from the pattern would have to be available in guard
>expressions.
>>
>> Syntactic options (and hazards) for guards abound; users would 
>> probably find it
>> natural to reuse `&&` to attach guards to patterns; C# has chosen 
>> `when` for
>> introducing guards; we could use `case P if (e)`, etc. Whatever we do
>
>> here,
>> there is a readability risk,  as the more complex guards are, the 
>> harder it is
>> to tell where the case label ends and the "body" begins.  (And worse 
>> if we allow
>> switch expressions inside guards.)
>>
>> An alternate to guards is to allow an imperative `continue` statement
>in
>> `switch`, which would mean "keep trying to match from the next 
>> label."  Given
>> the existing semantics of `continue`, this is a natural extension,
>but 
>> since
>> `continue` does not currently have meaning for switch, some work
>would 
>> have to
>> be done to disambiguate continue statements in switches enclosed in 
>> loops.  The
>> imperative version is strictly more expressive than most reasonable 
>> forms of the
>> declarative version, but users are likely to prefer the declarative 
>> version.
>>
>> ## Nulls
>>
>> Almost no language design exercise is complete without some degree of
>
>> wrestling
>> with `null`.  As we define more complex patterns than simple type 
>> patterns, and
>> extend constructs such as `switch` (which have existing opinions
>about 
>> nullity)
>> to support patterns, we need to have a clear understanding of which 
>> patterns
>> are nullable, and separate the nullity behaviors of patterns from the
>
>> nullity
>> behaviors of those constructs which use patterns.
>>
>> ## Nullity and patterns
>>
>> This topic has a number of easily-tangled concerns:
>>
>>  - **Construct nullability.**  Constructs to which we want to add
>pattern
>>    awareness (`instanceof`, `switch`) already have their own opinion
>about
>>    nulls.  Currently, `instanceof` always says false when presented
>with a
>>    `null`, and `switch` always NPEs.  We may, or may not, wish to 
>> refine these
>>    rules in some cases.
>>  - **Pattern nullability.**  Some patterns clearly would never match 
>> `null`
>>    (such as deconstruction patterns), whereas others (an "any" 
>> pattern, and
>>    surely the `null` constant pattern) might make sense to match
>null.
>>  - **Refactoring friendliness.**  There are a number of cases that we
>
>> would like
>>    to freely refactor back and forth, such as certain chains of `if 
>> ... else if`
>>    with switches.
>>  - **Nesting vs top-level.**  The "obvious" thing to do at the top 
>> level of a
>>    construct is not always the "obvious" thing to do in a nested 
>> construct.
>>  - **Totality vs partiality.**  When a pattern is partial on the 
>> operand type
>>    (e.g., `case String` when the operand of switch is `Object`), it
>is 
>> almost
>>    never the case we want to match null (except in the case of the
>`null`
>>    constant pattern), whereas when a pattern is total on the operand 
>> type (e.g.,
>>    `case Object` in the same example), it is more justifiable to
>match 
>> null.
>>  - **Inference.**  It would be nice if a `var` pattern were simply 
>> inference for
>>    a type pattern, rather than some possibly-non-denotable union.
>>
>> As a starting example, consider:
>>
>> ```
>> record Box(Object o) { }
>>
>> Box box = ...
>> switch (box) {
>>     case Box(Chocolate c):
>>     case Box(Frog f):
>>     case Box(var o):
>> }
>> ```
>>
>> It would be highly confusing and error-prone for either of the first
>two
>> patterns to match `Box(null)` -- given that `Chocolate` and `Frog` 
>> have no type
>> relation, it should be perfectly safe to reorder the two. But,
>because 
>> the last
>> pattern seems so obviously total on boxes, it is quite likely that 
>> what the
>> author wants is to match all remaining boxes, including those that 
>> contain null.
>> (Further, it would be terrible if there were _no_ way to say "Match 
>> any `Box`,
>> even if it contains `null`.  (While one might initially think this 
>> could be
>> repaired with OR patterns, imagine that `Box` had _n_ components -- 
>> we'd need to
>> OR together _2^n_ patterns, with complex merging, to express all the 
>> possible
>> combinations of nullity.))
>>
>> Scala and C# took the approach of saying that "var" patterns are not 
>> just type
>> inference, they are "any" patterns -- so `Box(Object o)` matches
>boxes
>> containing a non-null payload, where `Box(var o)` matches all boxes. 
>This
>> means, unfortunately, that `var` is not mere type inference -- which 
>> complicates
>> the role of `var` in the language considerably.  Users should not
>have 
>> to choose
>> between the semantics they want and being explicit about types; these
>
>> should be
>> orthogonal choices.  The above `switch` should be equivalent to:
>>
>> ```
>> Box box = ...
>> switch (box) {
>>     case Box(Chocolate c):
>>     case Box(Frog f):
>>     case Box(Object o):
>> }
>> ```
>>
>> and the choice to use `Object` or `var` should be solely one of 
>> whether the
>> manifest types are deemed to improve or impair readability.
>>
>> #### Construct and pattern nullability
>>
>> Currently, `instanceof` always says `false` on `null`, and `switch`
>always
>> throws on `null`.  Whatever null opinions a construct has, these are 
>> applied
>> before we even test any patterns.
>>
>> We can formalize the intuition outlined above as: type patterns that 
>> are _total_
>> on their target operand (`var x`, and `T t` on an operand of type
>`U`, 
>> where `U
>> <: T`) match null, and non-total type patterns do not. (Another way
>to say
>> this is: a `var` pattern is the "any" pattern, and a type pattern
>that 
>> is  total
>> on its operand type is also an "any" pattern.)  Additionally, the
>`null`
>> constant pattern matches null.  These are the _only_ nullable
>patterns.
>>
>> In our `Box` example, this means that the last case (whether written 
>> as `Box(var
>> o)` or `Box(Object o)`) matches all boxes, including those containing
>null
>> (because the nested pattern is total on the nested operand), but the 
>> first two
>> cases do not.
>>
>> If we retain the current absolute hostility of `switch` to nulls, we
>can't
>> trivially refactor from
>>
>> ```
>> switch (o) {
>>     case Box(Chocolate c):
>>     case Box(Frog f):
>>     case Box(var o):
>> }
>> ```
>> to
>>
>> ```
>> switch (o) {
>>     case Box(var contents):
>>         switch (contents) {
>>             case Chocolate c:
>>             case Frog f:
>>             case Object o:
>>         }
>>     }
>> }
>> ```
>>
>> because the inner `switch(contents)` would NPE before we tried to 
>> match any of
>> the patterns it contains.  Instead, the user would explicitly have to
>
>> do an `if
>> (contents == null)` test, and, if the intent was to handle `null` in 
>> the same
>> way as the `Object o` case, some duplication of code would be
>needed.  
>> We can
>> address this sharp corner by slightly relaxing the null-hostility of 
>> `switch`,
>> as described below.
>>
>> A similar sharp corner is the decomposition of a nested pattern
>`P(Q)` 
>> into
>> `P(alpha) & alpha instanceof Q`; while this is intended to be a 
>> universally
>> valid transformation, if P's 1st component might be null and Q is 
>> total,  this
>> transformation would not be valid because of the existing (mild) 
>> null-hostility
>> of `instanceof`.  Again, we may be able to address this by adjusting 
>> the rules
>> surrounding `instanceof` slightly.
>>
>> ## Generalizing switch
>>
>> The refactoring example above motivates why we might want to relax
>the
>> null-handling behavior of `switch`.  On the other hand, the one thing
>the
>> current behavior has going for it is that at least the current 
>> behavior is easy
>> to reason about; it always throws when confronted with a `null`.  Any
>
>> relaxed
>> behavior would be more complex; some switches would still have to 
>> throw (for
>> compatibility with existing semantics), and some (which can't be
>expressed
>> today) would accept nulls.  This is a tricky balance to achieve, but
>I 
>> think we
>> have a found a good one.
>>
>> A starting point is that we don't want to require readers to do an
>_O(n)_
>> analysis of each of the `case` labels just to determine whether a 
>> given switch
>> accepts `null` or not; this should be an _O(1)_ analysis.  (We do not
>
>> want to
>> introduce a new flavor of `switch`, such as `switch-nullable`; this 
>> might seem
>> to fix the proximate problem but would surely create others. As we've
>
>> done with
>> expression switch and patterns, we'd rather rehabilitate `switch`
>than 
>> create
>> an almost-but-not-quite-the-same variant.)
>>
>> Let's start with the null pattern, which we'll spell for sake of 
>> exposition
>> `case null`.  What if you were allowed to say `case null` in a
>switch, 
>> and the
>> switch would do the obvious thing?
>>
>> ```
>> switch (o) {
>>     case null -> System.out.println("Ugh, null");
>>     case String s -> System.out.println("Yay, non-null: " + s);
>> }
>> ```
>>
>> Given that the `case null` appears so close to the `switch`, it does 
>> not seem
>> confusing that this switch would match `null`; the existence of `case
>
>> null` at
>> the top of the switch makes it pretty clear that this is intended 
>> behavior.  (We
>> could further restrict the null pattern to being the first pattern in
>
>> a switch,
>> to make this clearer.)
>>
>> Now, let's look at the other end of the switch -- the last case. 
>What 
>> if the
>> last pattern is a total pattern?  (Note that if any `case` has a
>total 
>> pattern,
>> it _must_ be the last one, otherwise the cases after that would be 
>> dead, which
>> would be an error.)  Is it also reasonable for that to match null?  
>> After all,
>> we're saying "everything":
>>
>> ```
>> switch (o) {
>>     case String s: ...
>>     case Object o: ...
>> }
>> ```
>>
>> Under this interpretation, the switch-refactoring anomaly above goes
>away.
>>
>> The direction we're going here is that if we can localize the 
>> null-acceptance of
>> switches in the first (is it `case null`?) and last (is it total?) 
>> cases, then
>> the incremental complexity of allowing _some_ switches to accept null
>
>> might be
>> outweighed by the incremental benefit of treating `null` more 
>> uniformly (and
>> thus eliminating the refactoring anomalies.)  Note also that there is
>
>> no actual
>> code compatibility issue; this is all mental-model compatibility.
>>
>> So far, we're suggesting:
>>
>>  - A switch with a constant `null` case  will accept nulls;
>>  - If present, a constant `null` case must go first;
>>  - A switch with a total (any) case matches also accepts nulls;
>>  - If present, a total (any) case must go last.
>>
>> #### Relocating the problem
>>
>> It might be more helpful to view these changes as not changing the 
>> behavior of
>> `switch`, but of the `default` case of `switch`.  We can equally well
>
>> interpret
>> the current behavior as:
>>
>>  - `switch` always accepts `null`, but matching the `default` case of
>
>> a `switch`
>>    throws `NullPointerException`;
>>  - any `switch` without a `default` case has an implicit do-nothing 
>> `default`
>>    case.
>>
>> If we adopt this change of perspective, then `default`, not `switch`,
>
>> is in
>> control of the null rejection behavior -- and we can view these
>changes as
>> adjusting the behavior of `default`.  So we can recast the proposed 
>> changes as:
>>
>>   - Switches accept null;
>>   - A constant `null` case will match nulls, and must go first;
>>   - A total switch (a switch with a total `case`) cannot have a 
>> `default` case;
>>   - A non-total switch without a `default` case gets an implicit 
>> do-nothing
>>     `default` case;
>>   - Matching the (implicit or explicit) default case with a `null`
>operand
>>     always throws NPE.
>>
>> The main casualty here is that the `default` case does not mean the
>same
>> thing as `case var x` or `case Object o`.  We can't deprecate 
>> `default`, but
>> for pattern switches, it becomes much less useful.
>>
>> #### What about method (declared) patterns?
>>
>> So far, we've declared all patterns, except the `null` constant 
>> pattern and the
>> total (any) pattern, to not match `null`.  What about patterns that
>are
>> explicitly declared in code?  It turns out we can rule out these
>matching
>> `null` fairly easily.
>>
>> We can divide declared patterns into three kinds: deconstruction 
>> patterns (dual
>> to constructors), static patterns (dual to static methods), and
>instance
>> patterns (dual to instance methods.)  For both deconstruction and
>instance
>> patterns, the match target becomes the receiver; method bodies are
>never
>> expected to deal with the case where `this == null`.
>>
>> For static patterns, it is conceivable that they could match `null`, 
>> but this
>> would put a fairly serious burden on writers of static patterns to 
>> check for
>> `null` -- which they would invariably forget, and many more NPEs
>would 
>> ensue.
>> (Think about writing the pattern for `Optional.of(T t)` -- it would
>be
>> overwhelmingly likely we'd forget to check the target for nullity.)  
>> SO there
>> is a strong argument to simply say "declared patterns never match 
>> null", to
>> not put writers of such patterns in this situation.
>>
>> So, only the top and bottom patterns in a switch could match null; if
>
>> the top
>> pattern is not `case null`, and the bottom pattern is not total, then
>
>> the switch
>> throws NPE on null, otherwise it accepts null.
>>
>> #### Adjusting instanceof
>>
>> The remaining anomaly we had was about unrolling nested patterns when
>
>> the inner
>> pattern is total.  We can plug this by simply outlawing total
>patterns in
>> `instanceof`.
>>
>> This may seem like a cheap trick, but it makes sense on its own.  If
>the
>> following statement was allowed:
>>
>> ```
>> if (e instanceof var x) { X }
>> ```
>>
>> it would simply be confusing; on the one hand, it looks like it
>should 
>> always
>> match, but on the other, `instanceof` is historically null-hostile.  
>> And, if the
>> pattern always matches, then the `if` statement is silly; it should
>be 
>> replaced
>> with:
>>
>> ```
>> var x = e;
>> X
>> ```
>>
>> since there's nothing conditional about it.  So by banning "any" 
>> patterns on the
>> RHS of `instanceof`, we both avoid a confusion about what is going to
>
>> happen,
>> and we prevent the unrolling anomaly.
>>
>> For reasons of compatibility, we will have to continue to allow
>>
>> ```
>> if (e instanceof Object) { ... }
>> ```
>>
>> which succeeds on all non-null operands.
>>
>> We've been a little sloppy with the terminology of "any" vs "total"; 
>> note that
>> in
>>
>> ```
>> Point p;
>> if (p instanceof Point(var x, var y)) { }
>> ```
>>
>> the pattern `Point(var x, var y)` is total on `Point`, but not an 
>> "any" pattern
>> -- it still doesn't match on p == null.
>>
>> On the theory that an "any" pattern in `instanceof` is silly, we may 
>> also want
>> to ban other "silly" patterns in `instanceof`, such as constant 
>> patterns, since
>> all of the following have simpler forms:
>>
>> ```
>> if (x instanceof null) { ... }
>> if (x instanceof "") { ... }
>> if (i instanceof 3) { ... }
>> ```
>>
>> In the first round (type patterns in `instanceof`), we mostly didn't 
>> confront
>> this issue, saying that `instanceof T t` matched in all the cases
>where
>> `instanceof T` would match.  But given that the solution for `switch`
>
>> relies
>> on "any" patterns matching null, we may wish to adjust the behavior
>of
>> `instanceof` before it exits preview.
>>
>>
>> [jep305]: https://openjdk.java.net/jeps/305
>> [patternmatch]: pattern-match.html
>>

-- 
Envoyé de mon appareil Android avec Courriel K-9 Mail. Veuillez excuser ma 
brièveté.

Re: Next up for patterns: type patterns in switch

Reply via email to