Re: PM design question: Scopes

Guy Steele Mon, 20 Nov 2017 10:48:09 -0800

I like this.  One question: what does this new theory have to say about the 
situation


switch (x) {
  case Foo(int x):
        int y = x;
        // fall through
  case Bar(int x, int y):
        …
}

?  Perhaps it is forbidden because the “int y” in the pattern would shadow the 
“int y” in the earlier declaration?  Or can the two be merged?

—Guy


> On Nov 20, 2017, at 1:17 PM, Brian Goetz <[email protected]> wrote:
> 
> 
> We had a long meeting regarding scoping and shadowing of pattern variables.  
> We ended up in a good place, and we were all a bit surprised at where it 
> seems to be pointing.
> 
> We started with two use cases that we thought were important: 
> 
> Re-use of binding variables: 
> 
>     switch (x) { 
>         case Foo(var a): ...  break; 
>         case Bar(var a): ... 
>     } 
> 
> Short-circuiting tests: 
> 
>     if (!(x matches Foo(var a)) 
>         throw new NotFooException(); 
>     // use a here 
> 
> We had a few nice-to-haves: 
>  - that binding variables should be ordinary variables, not something new; 
>  - that binding, when assigned, be final 
> 
> Where we expected to land was something like: 
>  - binding variables are treated as blank finals 
>  - binding variables are hoisted into a synthetic block, which starts right 
> before the statement containing the expression defining the binding 
>  - it is permitted for locals to shadow other locals that are DU at the point 
> of shadowing.  (This, as a bonus, would rescue the existing unfortunate 
> scoping of local variables defined in switch blocks.) 
> 
> We thought this was a sensible place to land because it built on the existing 
> notion of scoping and local variables.  The remaining question, it seemed, 
> was: "where does this synthetic scope end." 
> 
> First, a note about where the scope starts.  Consider: 
> 
>     if (e1 && x matches Foo(var a)) { 
>         ... 
>     } 
> 
> Logically, we'd like to start the scope for `a` right where it is first 
> declared; this is how locals work.  But, if we want to maintain the existing 
> concept of local variable scope, it has to start earlier.  The latest 
> candidate is right before the if starts; we act as if there is an invisible { 
> ... } containing the entirety of the if statement, and declare `a` there. 
> 
> This means, though, that the scope of `a` includes `e1`, even though `a` is 
> declared later.  This is confusing, but maybe we can ignore this, and provide 
> a clear diagnostic if the user stumbles across it. 
> 
> So, where does the scope end?  The obvious candidate is right after the if 
> statement.  This means `a` is in scope for the entire if-else, but, because 
> it is DU in the else-blocks, can be reused if we adopt the "shadowing OK if 
> DU" rule. 
> 
> FWIW, the "shadowing ok if DU" rule is clever, and gives us the behavior we 
> want for switch / if-else chains with patterns, but has some collateral 
> damage.  For example, the following would become valid code: 
> 
>     int x;  // declared but never used 
>     float x = 1.0f;  // acceptable shadowing of int x 
> 
> Again, maybe we can ignore this.  But where things really blew up was 
> attempting to handle the short-circuiting if case: 
> 
>     if (!(x matches Foo(var a)) 
>         throw new NotFooException(); 
>     // use a here 
> 
> For this to work, we'd have to extend the scope to the end of the block 
> containing the if statement.  Now, given our "shadowing is OK if DU rule", 
> this is fine, right?  Not so fast.  In this simpler case: 
> 
>     if (x matches Foo(var b)) { } 
>     // try to reuse b here, I dare you 
> 
> we find that 
>  - B is neither DU nor DA after the if, so we can't shadow it; 
>  - B is final and not DU, so we can't write to it; 
>  - B is not DA, so we can't use it. 
> 
> In other words, B is a permanent toxic waste zone, we can neither use, nor 
> redeclare, nor assign it.  Urk. 
> 
> Note too that our scoping rule is not really about unbalanced ifs; it's about 
> abrupt completion.  This is reasonable too: 
> 
>     if (x matches Foo(var a)) { 
>         println("Matched!"); 
>     } 
>     else 
>         throw new NotFooException(); 
>     // reasonable to use a here too! 
> 
> Taking stock: our goal here was to try and use normal scopes and blank final 
> semantics to describe binding variables, out of a desire to not introduce new 
> concepts.  But it's a bad fit; the scope may be unnaturally large on the 
> beginning side, and wherever we set the end of the scope, we end up in a 
> choice of bad situations (either something we want in scope is not, or 
> something we don't want in scope is.)  So traditional scopes are just a bad 
> approximation, and what we gain in "reusing familiar concepts", we lose in 
> the mismatch. 
> 
> 
> STEPPING BACK 
> 
> What we realized at this point is that the essence of binding variables is 
> their _conditionality_.  There is not a single logical old-style scope that 
> describes the right set of places for a binding to be in scope, but there is 
> a well-defined control-flow analysis that tells us exactly where we can use 
> the binding, and where we can't.  This is the flow-scoping construct we 
> initially worried was too "new and different."  But, after some further 
> thought, and a few tweaks, this seems exactly what we want, and I think can 
> be made understandable. 
> 
> The basic idea behind flow-scoping is: a binding variable is in scope where 
> it is well-defined, and not in scope when it is not.  We'll provide a 
> complete calculus, but the key thing to understand is that the rules of flow 
> scoping are just plain old DA/DU; if a binding is DA, then it is 
> well-defined. 
> 
> In particular, flow-scoping can handle abrupt termination naturally; for a 
> statement: 
> 
>     if (x matches Foo(var a)) { A } 
>     else { B } 
>     C 
> 
> the scope of `a` includes A, and also includes C iff B completes abruptly.  
> We can easily explain this as: 
>  - if x matches Foo(var a), we execute the A block, and in this case `a` is 
> clearly well-defined (as we'd not execute A if the match failed); 
>  - The only way to reach C, if B completes abruptly, is if the match 
> succeeds, so `a` is well defined during C in this case too. 
> 
> Because the scope of a binding variable is precisely the cases in which it is 
> well defined, there is no need to tinker with shadowing. 
> 
> Conditional variables can now always be final, because they will never be in 
> scope and not DA. 
> 
> Similarly, folding reachability into scoping for conditional variables also 
> means that fallthrough has a well-defined meaning.  If we have:
> 
>     case Foo(int x): ... break;
>     case Bar(int x): ....
> 
> then the Bar case is not reachable from where x would be initialized, so the 
> first x is not in scope when the second x is declared, and everything is 
> great.  On the other hand:
> 
>     case Foo(int x): ... no break ...
>     case Bar(int x): ... A ...
> 
> now x is well-defined in A, no matter how we got there.  (The merging of the 
> two xs is the same merging we have to do anyway for "if (x matches Foo(int a) 
> || x matches Bar(int a)".)  
> 
> 
> People had originally expressed concern that flow-scoping leaves a scope 
> "with holes", and allows puzzlers with shadowing of fields. (This is the 
> "swiss cheese" problem.) For example: 
> 
>     // Field 
>     String s 
> 
>     if (!(x matches String s)) { 
>         a(s); 
>     } 
>     else { 
>         b(s); 
>     } 
> 
> This would be confusing because the `s` passed to a() is the field, but the 
> `s` passed to b() is the binding.  But, there's a really simple way to 
> prevent this: do not allow conditional variables to shadow fields or locals.  
> Now, there is no chance of this confusion, and this is not a big constraint, 
> because the names of conditional variables are strictly local.  (Further, we 
> can disallow shadowing of in-scope conditional variables by locals (or other 
> conditional variables.)) 
> 
> 
> Scorecard: 
>  - Relatively straightforward to spec, as we have a clean calculus for 
> flow-scoped conditional variables; 
>  - Relatively straightforward to implement (our prototype already does this); 
>  - One new concept: conditional variables; 
>  - Conditional vars are scope where they make sense, and not in scope where 
> they do not, cannot be assigned to (always DA and final when in scope), and 
> are never in scope when not DA; 
>  - No changes to shadowing; 
>  - Meets all the target use cases. 
> 
> 
> 
> 
> On 11/3/2017 6:44 AM, Gavin Bierman wrote:
>> Scopes
>> 
>> Java has five constructs that introduce fresh variables into scope: the 
>> local variable declaration statement, the for statement, the 
>> try-with-resources statement, the catch block, and lambda expressions. The 
>> first, local variable declaration statements, introduce variables that are 
>> in scope for the rest of the block that it is declared in. The others 
>> introduce variables that are limited in their scope.
>> 
>> The addition of pattern matching brings a new expression, matches, and 
>> extends the switch statement. Both these constructs can now introduce fresh 
>> (and, if the pattern match succeeds, definitely assigned (DA)) variables. 
>> But the question is what is the scope of these ‘pattern’ variables?
>> 
>> Let us consider the pattern matching constructs in turn. First the switch 
>> statement:
>> 
>> switch (o) {
>>     case int i: ...
>>     case ..
>> }
>> What is the scope of the pattern variable i? There are a range of options.
>> 
>> The scope of the pattern variable is from the start of the switch statement 
>> until the end of the enclosing block.
>> 
>> In this case the pattern variable is in scope but would be definitely 
>> unassigned (DU) immediately after the switch statement.
>> 
>> switch (o) {
>>     case int i : ... // DA
>>                  ... // DA
>>     case T t :       // i is in scope 
>> }
>> ... // i in still in scope and DU
>> +ve Simple
>> -ve Can’t simply reuse a pattern variable in the same switch statement 
>> (without some form of shadowing)
>> -ve Pattern variable poisons the rest of the block
>> The scope of the pattern variable extends only to the end of the switch 
>> block.
>> 
>> In this case the pattern variable would be considered DA only for the 
>> statements between the current case label and the subsequent case labeled 
>> statement. For example:
>> 
>> switch (o) {
>>     case int i : ... // DA
>>                  ... // DA
>>     case T t :       // i is in scope but not DA
>> }
>> ... // i not in scope
>> +ve Simple
>> +ve Pattern variables not poisoned in subsequent statements in the rest of 
>> the block
>> +ve Similar technique to for identifiers (not a new idea)
>> -ve Can’t simply reuse a pattern variable in the same switch statement 
>> (without some form of shadowing)
>> The scope of the pattern variable extends only to the next case label.
>> 
>> switch (o) {
>>     case int i : ... // in scope and DA
>>                  ... // in scope and DA
>>     case T i :       // int i not in scope, so can re-use
>> }
>> ... // i not in scope
>> +ve Simple syntactic rule
>> +ve Allows reuse of pattern variable in the same switch statement.
>> -ve Doesn’t make sense for fallthrough
>> NOTE This final point is important - supporting fallthrough impacts on what 
>> solution we might choose for scoping of pattern variables. (We could not 
>> support fallthrough and instead support OR patterns - a further design 
>> dimension.)
>> 
>> ASIDE Should we support a switch expression; it seems clear that scoping 
>> should be treated in the same way as it is for lambda expressions.
>> 
>> The matches expression is unusual in that it is an expression that 
>> introduces a fresh variable. What is the scope of this variable? We want it 
>> to be more than the expression itself, as we want the following example code 
>> to be correct:
>> 
>> if (e matches String s) {
>>     System.out.println("It's a string - " + s);
>> }
>> In other words, the variable introduced by the pattern needs to be in scope 
>> for an enclosing IfThen statement.
>> 
>> However, a match expression could be nested within another expression. It 
>> seems reasonable that the patterns variables are in scope for at least the 
>> rest of the expression. For example:
>> 
>> (e matches String s || s.length() > 0) 
>> Here the s should be in scope for the subexpression s.length (although it is 
>> not DA). In contrast:
>> 
>> (e matches String s && s.length() > 0)
>> Here the s is both in scope and DA for the subexpression s.length.
>> 
>> However, what about the following:
>> 
>> if (s.length() > 0 && e matches String s) {
>>     System.out.println(s);
>> }
>> Given the idea that a pattern variable flows from the inside-out to the 
>> enclosing statement, it would appear that s is in scope for the 
>> subexpression s.length; although it is not DA. Unless we want scopes to be 
>> non-contiguous, we will have to accept this rather odd situation (consider 
>> where s shadows a field). [This appears to be what happens in the current C# 
>> compiler.]
>> 
>> Now let’s consider how far a pattern variable flows wrt its enclosing 
>> statement. We have a range of options:
>> 
>> The scope is both the statement that the match expression occurs in and the 
>> rest of the block. In this scenario,
>> 
>> if (o matches T t) {
>>     ... 
>> } else {
>>     ...
>> }
>> is treated as equivalent to the following pseudo-code (where match-and-bind 
>> is a fictional pattern matching construct that pattern-matches and binds to 
>> a variable that has already been declared)
>> 
>> T t;
>> if (o match-and-bind t) {
>>     // t in scope and DA
>> } else {
>>     // t in scope and DU
>> }
>> // t in scope and DU
>> This is how the current C# compiler works (although the spec describes the 
>> next option; so perhaps this is a bug).
>> 
>> The scope is just the statement that the match expression occurs in. In this 
>> scenario,
>> 
>> if (o matches T t) {
>> ... 
>> } else {
>> 
>> }
>> ...
>> is treated as equivalent to the pseudo-code
>> 
>> { T t;
>>   if (o match-and-bind t) {
>>       // t in scope and DA
>>   } else {
>>       // t in scope and DU
>>       // thus declaration int t = 42; is not allowed.
>>   }
>> }
>> // t not in scope
>> ...
>> This restricted scope allows reuse of pattern variables, e.g.
>> 
>> if (o matches T x) { ... }
>> if (o matches S x) { ... }
>> The scope of the pattern variable is determined by a flow analysis of the 
>> enclosing statement. (It could be thought of as a refinement of option b.) 
>> This is currently implemented in the prototype compiler. For example:
>> 
>> if (!!(o matches T t)) {
>>      // t in scope
>> } else {
>>      // t not in scope
>> }
>> +ve Code will work in the presence of most refactorings
>> +ve We have this code working already :-)
>> -ve This is a break to the existant notion of scope as a contiguous program 
>> fragment. A scope can now have holes in it. Will users ever understand this? 
>> (Although they are very similar to the flow-based rules for DA/DU.)
>> ASIDE Regardless of whether we opt for (b) or (c) we may consider a further 
>> extension where we allow the scope to extend beyond the current statement 
>> for the case of an unbalanced if statement. For example
>> 
>> ```
>> if (!(o matches T t)) {
>>     return;
>> }
>> // t in scope 
>> ...
>> return;
>> ```
>> +ve Supports a common idiom where else blocks are not needed
>> -ve Yet further complication of notion of scope.
>> 
>

Re: PM design question: Scopes

Reply via email to