I like this. One question: what does this new theory have to say about the
situation
switch (x) {
case Foo(int x):
int y = x;
// fall through
case Bar(int x, int y):
…
}
? Perhaps it is forbidden because the “int y” in the pattern would shadow the
“int y” in the earlier declaration? Or can the two be merged?
—Guy
> On Nov 20, 2017, at 1:17 PM, Brian Goetz <[email protected]> wrote:
>
>
> We had a long meeting regarding scoping and shadowing of pattern variables.
> We ended up in a good place, and we were all a bit surprised at where it
> seems to be pointing.
>
> We started with two use cases that we thought were important:
>
> Re-use of binding variables:
>
> switch (x) {
> case Foo(var a): ... break;
> case Bar(var a): ...
> }
>
> Short-circuiting tests:
>
> if (!(x matches Foo(var a))
> throw new NotFooException();
> // use a here
>
> We had a few nice-to-haves:
> - that binding variables should be ordinary variables, not something new;
> - that binding, when assigned, be final
>
> Where we expected to land was something like:
> - binding variables are treated as blank finals
> - binding variables are hoisted into a synthetic block, which starts right
> before the statement containing the expression defining the binding
> - it is permitted for locals to shadow other locals that are DU at the point
> of shadowing. (This, as a bonus, would rescue the existing unfortunate
> scoping of local variables defined in switch blocks.)
>
> We thought this was a sensible place to land because it built on the existing
> notion of scoping and local variables. The remaining question, it seemed,
> was: "where does this synthetic scope end."
>
> First, a note about where the scope starts. Consider:
>
> if (e1 && x matches Foo(var a)) {
> ...
> }
>
> Logically, we'd like to start the scope for `a` right where it is first
> declared; this is how locals work. But, if we want to maintain the existing
> concept of local variable scope, it has to start earlier. The latest
> candidate is right before the if starts; we act as if there is an invisible {
> ... } containing the entirety of the if statement, and declare `a` there.
>
> This means, though, that the scope of `a` includes `e1`, even though `a` is
> declared later. This is confusing, but maybe we can ignore this, and provide
> a clear diagnostic if the user stumbles across it.
>
> So, where does the scope end? The obvious candidate is right after the if
> statement. This means `a` is in scope for the entire if-else, but, because
> it is DU in the else-blocks, can be reused if we adopt the "shadowing OK if
> DU" rule.
>
> FWIW, the "shadowing ok if DU" rule is clever, and gives us the behavior we
> want for switch / if-else chains with patterns, but has some collateral
> damage. For example, the following would become valid code:
>
> int x; // declared but never used
> float x = 1.0f; // acceptable shadowing of int x
>
> Again, maybe we can ignore this. But where things really blew up was
> attempting to handle the short-circuiting if case:
>
> if (!(x matches Foo(var a))
> throw new NotFooException();
> // use a here
>
> For this to work, we'd have to extend the scope to the end of the block
> containing the if statement. Now, given our "shadowing is OK if DU rule",
> this is fine, right? Not so fast. In this simpler case:
>
> if (x matches Foo(var b)) { }
> // try to reuse b here, I dare you
>
> we find that
> - B is neither DU nor DA after the if, so we can't shadow it;
> - B is final and not DU, so we can't write to it;
> - B is not DA, so we can't use it.
>
> In other words, B is a permanent toxic waste zone, we can neither use, nor
> redeclare, nor assign it. Urk.
>
> Note too that our scoping rule is not really about unbalanced ifs; it's about
> abrupt completion. This is reasonable too:
>
> if (x matches Foo(var a)) {
> println("Matched!");
> }
> else
> throw new NotFooException();
> // reasonable to use a here too!
>
> Taking stock: our goal here was to try and use normal scopes and blank final
> semantics to describe binding variables, out of a desire to not introduce new
> concepts. But it's a bad fit; the scope may be unnaturally large on the
> beginning side, and wherever we set the end of the scope, we end up in a
> choice of bad situations (either something we want in scope is not, or
> something we don't want in scope is.) So traditional scopes are just a bad
> approximation, and what we gain in "reusing familiar concepts", we lose in
> the mismatch.
>
>
> STEPPING BACK
>
> What we realized at this point is that the essence of binding variables is
> their _conditionality_. There is not a single logical old-style scope that
> describes the right set of places for a binding to be in scope, but there is
> a well-defined control-flow analysis that tells us exactly where we can use
> the binding, and where we can't. This is the flow-scoping construct we
> initially worried was too "new and different." But, after some further
> thought, and a few tweaks, this seems exactly what we want, and I think can
> be made understandable.
>
> The basic idea behind flow-scoping is: a binding variable is in scope where
> it is well-defined, and not in scope when it is not. We'll provide a
> complete calculus, but the key thing to understand is that the rules of flow
> scoping are just plain old DA/DU; if a binding is DA, then it is
> well-defined.
>
> In particular, flow-scoping can handle abrupt termination naturally; for a
> statement:
>
> if (x matches Foo(var a)) { A }
> else { B }
> C
>
> the scope of `a` includes A, and also includes C iff B completes abruptly.
> We can easily explain this as:
> - if x matches Foo(var a), we execute the A block, and in this case `a` is
> clearly well-defined (as we'd not execute A if the match failed);
> - The only way to reach C, if B completes abruptly, is if the match
> succeeds, so `a` is well defined during C in this case too.
>
> Because the scope of a binding variable is precisely the cases in which it is
> well defined, there is no need to tinker with shadowing.
>
> Conditional variables can now always be final, because they will never be in
> scope and not DA.
>
> Similarly, folding reachability into scoping for conditional variables also
> means that fallthrough has a well-defined meaning. If we have:
>
> case Foo(int x): ... break;
> case Bar(int x): ....
>
> then the Bar case is not reachable from where x would be initialized, so the
> first x is not in scope when the second x is declared, and everything is
> great. On the other hand:
>
> case Foo(int x): ... no break ...
> case Bar(int x): ... A ...
>
> now x is well-defined in A, no matter how we got there. (The merging of the
> two xs is the same merging we have to do anyway for "if (x matches Foo(int a)
> || x matches Bar(int a)".)
>
>
> People had originally expressed concern that flow-scoping leaves a scope
> "with holes", and allows puzzlers with shadowing of fields. (This is the
> "swiss cheese" problem.) For example:
>
> // Field
> String s
>
> if (!(x matches String s)) {
> a(s);
> }
> else {
> b(s);
> }
>
> This would be confusing because the `s` passed to a() is the field, but the
> `s` passed to b() is the binding. But, there's a really simple way to
> prevent this: do not allow conditional variables to shadow fields or locals.
> Now, there is no chance of this confusion, and this is not a big constraint,
> because the names of conditional variables are strictly local. (Further, we
> can disallow shadowing of in-scope conditional variables by locals (or other
> conditional variables.))
>
>
> Scorecard:
> - Relatively straightforward to spec, as we have a clean calculus for
> flow-scoped conditional variables;
> - Relatively straightforward to implement (our prototype already does this);
> - One new concept: conditional variables;
> - Conditional vars are scope where they make sense, and not in scope where
> they do not, cannot be assigned to (always DA and final when in scope), and
> are never in scope when not DA;
> - No changes to shadowing;
> - Meets all the target use cases.
>
>
>
>
> On 11/3/2017 6:44 AM, Gavin Bierman wrote:
>> Scopes
>>
>> Java has five constructs that introduce fresh variables into scope: the
>> local variable declaration statement, the for statement, the
>> try-with-resources statement, the catch block, and lambda expressions. The
>> first, local variable declaration statements, introduce variables that are
>> in scope for the rest of the block that it is declared in. The others
>> introduce variables that are limited in their scope.
>>
>> The addition of pattern matching brings a new expression, matches, and
>> extends the switch statement. Both these constructs can now introduce fresh
>> (and, if the pattern match succeeds, definitely assigned (DA)) variables.
>> But the question is what is the scope of these ‘pattern’ variables?
>>
>> Let us consider the pattern matching constructs in turn. First the switch
>> statement:
>>
>> switch (o) {
>> case int i: ...
>> case ..
>> }
>> What is the scope of the pattern variable i? There are a range of options.
>>
>> The scope of the pattern variable is from the start of the switch statement
>> until the end of the enclosing block.
>>
>> In this case the pattern variable is in scope but would be definitely
>> unassigned (DU) immediately after the switch statement.
>>
>> switch (o) {
>> case int i : ... // DA
>> ... // DA
>> case T t : // i is in scope
>> }
>> ... // i in still in scope and DU
>> +ve Simple
>> -ve Can’t simply reuse a pattern variable in the same switch statement
>> (without some form of shadowing)
>> -ve Pattern variable poisons the rest of the block
>> The scope of the pattern variable extends only to the end of the switch
>> block.
>>
>> In this case the pattern variable would be considered DA only for the
>> statements between the current case label and the subsequent case labeled
>> statement. For example:
>>
>> switch (o) {
>> case int i : ... // DA
>> ... // DA
>> case T t : // i is in scope but not DA
>> }
>> ... // i not in scope
>> +ve Simple
>> +ve Pattern variables not poisoned in subsequent statements in the rest of
>> the block
>> +ve Similar technique to for identifiers (not a new idea)
>> -ve Can’t simply reuse a pattern variable in the same switch statement
>> (without some form of shadowing)
>> The scope of the pattern variable extends only to the next case label.
>>
>> switch (o) {
>> case int i : ... // in scope and DA
>> ... // in scope and DA
>> case T i : // int i not in scope, so can re-use
>> }
>> ... // i not in scope
>> +ve Simple syntactic rule
>> +ve Allows reuse of pattern variable in the same switch statement.
>> -ve Doesn’t make sense for fallthrough
>> NOTE This final point is important - supporting fallthrough impacts on what
>> solution we might choose for scoping of pattern variables. (We could not
>> support fallthrough and instead support OR patterns - a further design
>> dimension.)
>>
>> ASIDE Should we support a switch expression; it seems clear that scoping
>> should be treated in the same way as it is for lambda expressions.
>>
>> The matches expression is unusual in that it is an expression that
>> introduces a fresh variable. What is the scope of this variable? We want it
>> to be more than the expression itself, as we want the following example code
>> to be correct:
>>
>> if (e matches String s) {
>> System.out.println("It's a string - " + s);
>> }
>> In other words, the variable introduced by the pattern needs to be in scope
>> for an enclosing IfThen statement.
>>
>> However, a match expression could be nested within another expression. It
>> seems reasonable that the patterns variables are in scope for at least the
>> rest of the expression. For example:
>>
>> (e matches String s || s.length() > 0)
>> Here the s should be in scope for the subexpression s.length (although it is
>> not DA). In contrast:
>>
>> (e matches String s && s.length() > 0)
>> Here the s is both in scope and DA for the subexpression s.length.
>>
>> However, what about the following:
>>
>> if (s.length() > 0 && e matches String s) {
>> System.out.println(s);
>> }
>> Given the idea that a pattern variable flows from the inside-out to the
>> enclosing statement, it would appear that s is in scope for the
>> subexpression s.length; although it is not DA. Unless we want scopes to be
>> non-contiguous, we will have to accept this rather odd situation (consider
>> where s shadows a field). [This appears to be what happens in the current C#
>> compiler.]
>>
>> Now let’s consider how far a pattern variable flows wrt its enclosing
>> statement. We have a range of options:
>>
>> The scope is both the statement that the match expression occurs in and the
>> rest of the block. In this scenario,
>>
>> if (o matches T t) {
>> ...
>> } else {
>> ...
>> }
>> is treated as equivalent to the following pseudo-code (where match-and-bind
>> is a fictional pattern matching construct that pattern-matches and binds to
>> a variable that has already been declared)
>>
>> T t;
>> if (o match-and-bind t) {
>> // t in scope and DA
>> } else {
>> // t in scope and DU
>> }
>> // t in scope and DU
>> This is how the current C# compiler works (although the spec describes the
>> next option; so perhaps this is a bug).
>>
>> The scope is just the statement that the match expression occurs in. In this
>> scenario,
>>
>> if (o matches T t) {
>> ...
>> } else {
>>
>> }
>> ...
>> is treated as equivalent to the pseudo-code
>>
>> { T t;
>> if (o match-and-bind t) {
>> // t in scope and DA
>> } else {
>> // t in scope and DU
>> // thus declaration int t = 42; is not allowed.
>> }
>> }
>> // t not in scope
>> ...
>> This restricted scope allows reuse of pattern variables, e.g.
>>
>> if (o matches T x) { ... }
>> if (o matches S x) { ... }
>> The scope of the pattern variable is determined by a flow analysis of the
>> enclosing statement. (It could be thought of as a refinement of option b.)
>> This is currently implemented in the prototype compiler. For example:
>>
>> if (!!(o matches T t)) {
>> // t in scope
>> } else {
>> // t not in scope
>> }
>> +ve Code will work in the presence of most refactorings
>> +ve We have this code working already :-)
>> -ve This is a break to the existant notion of scope as a contiguous program
>> fragment. A scope can now have holes in it. Will users ever understand this?
>> (Although they are very similar to the flow-based rules for DA/DU.)
>> ASIDE Regardless of whether we opt for (b) or (c) we may consider a further
>> extension where we allow the scope to extend beyond the current statement
>> for the case of an unbalanced if statement. For example
>>
>> ```
>> if (!(o matches T t)) {
>> return;
>> }
>> // t in scope
>> ...
>> return;
>> ```
>> +ve Supports a common idiom where else blocks are not needed
>> -ve Yet further complication of notion of scope.
>>
>