[lvti] Handling of capture variables

Dan Smith Fri, 31 Mar 2017 16:40:21 -0700

As described in the JSR 286 spec document, inferring the type of a local 
variable to be a non-denotable type (one that can't be written in source) is 
something to be careful about, due to "potential for confusion, bad error 
messages, or added exposure to bugs".


The most significant area here (in terms of likely frequency) is the presence 
of capture variables in the type. I did some analysis of the Java SE APIs to 
identify and illustrate problematic cases.

== Case 1: wildcard-parameterized return type ==

Any method (or field) that returns a wildcard-parameterized type will produce a 
non-denotable type on invocation, because the return type must be captured (JLS 
15.12.3).

var myClass = getClass();
var c = Class.forName("java.lang.Object");
var sup = String.class.getSuperclass();
var entries = new ZipFile("/etc/filename.zip").entries();
var joiner = Collectors.joining(" - \n", "<start>", "<end>");
var plusCollector = Collectors.reducing(BigInteger.ZERO, BigInteger::add);
var future = Executors.newCachedThreadPool().submit(System::gc);
void m(MethodType type) { var ret = type.returnType(); }
void m(TreeSet<String> set) { var comparator = set.comparator(); }
void m(Annotation ann) { var annClass = ann.annotationType(); }
void m(ReferenceQueue<String> queue) { var stringRef = queue.poll(); }

Using wildcards in a return type is sometimes discouraged, but other times it's 
the right thing to do.  So while I wouldn't say these methods are pervasive, 
there are quite a few of them (especially where the common idiom is to almost 
always use a wildcard, as in Class and Collector).

There are no capture variables present for methods that return arrays, lists, 
etc., of wildcard-parameterized types, because capture doesn't touch those 
nested wildcards:

void m(MethodType type) { var params = type.parameterArray(); }
void m(MethodType type) { var params = type.parameterList(); }

== Case 2: instance method returning a class type parameter ==

A method (or field) whose return type is a class type parameter will produce a 
capture variable when invoked for a wildcard-parameterized type.

void m(Class<? extends Runnable> c) throws Exception { var runnable = 
c.newInstance(); }
void m(Map<String, ? extends Throwable> map) { var e = map.get("some.key"); }
void m(List<? extends Set<String>> sets) { var first = sets.get(0); }
Object find(Collection<?> coll, Object o) { for (var elt : coll) { if 
(elt.equals(o)) return elt; } return null; }
void m(Optional<? extends Number> opt) { var num = opt.get(); }
void m(IntFunction<? extends Reader> f) { var reader = f.apply(14); }
void m(Future<? extends ZipEntry> future) { var entry = future.get(10, 
TimeUnit.SECONDS); }

If you substitute a wildcard-parameterized type into the return type, that also 
leads to capture:

void m(List<Set<? extends Number>> list) { var set = list.get(0); }

This is true for for-each, too (for now, javac fails to perform capture 
correctly, so you don't see this in the prototype):

void m(List<Set<? extends Number>> list) { for (var set : list) set.clear(); }

== Method category 3: instance method returning a type that mentions a class 
type parameter ==

A method (or field) whose return type *mentions* a class type parameter (e.g., 
Iterator<E> in Iterable.iterator) will also produce a non-denotable type when 
invoked for a wildcard-parameterized type.  Unlike Category 2, which tend to be 
"terminal operations", these types often arise in chains.

var constructor = Class.forName("java.lang.Object").getConstructor();
void m(Map<? extends Number, String> map) { var keys = map.keySet(); }
void m(Map<? extends Number, String> map) { var iter = map.keySet().iterator(); 
}
void m(TreeMap<String, ? extends Throwable> map) { var tail = map.subMap("b", 
"c"); }
void m(TreeSet<String> set) { var reverseOrder = set.comparator().reversed(); }
void m(List<? extends Number> list) { var unique = 
list.stream().distinct().sorted(); }
void m(List<? extends Throwable> stream) { var best = 
stream.min(Comparator.comparing(e -> e.getStackTrace().length)); }
void m(Function<? super String, File> f1, Function<? super File, Integer> f2) { 
var f = f1.andThen(f2); }
void m(Predicate<? super File> discard) { var keep = discard.negate(); }

== Case 4: method with inferred type parameter in return type ==

A method (or constructor) whose return type includes an inferred type parameter 
may end up substituting capture variables or other non-denotable types.  This 
typically depends on the types of the arguments, again with a 
wildcard-parameterized type showing up somewhere.

void m(Enumeration<? extends Runnable> tasks) { var list = 
Collections.list(tasks); }
void m(Set<?> set) { var syncSet = Collections.synchronizedSet(set); }
void m(Function<? super String, ? extends Throwable> f) { var es = 
Stream.of("a", "b", "c").map(f); }

There are also cases here that are specified to produce capture vars but do not 
in javac:

void m(List<? extends Number> ns) { var firstSet = 
Collections.singleton(ns.get(0)); }

----------------

With that in mind, looking at our three options for dealing with capture 
variables:
1) Allow the non-denotable type
2) Map the type to a supertype that is denotable
3) Report an error

(3) isn't viable. "You can't use 'var' with 'getClass'" is already pretty bad. 
Prohibiting all the uses above would be really bad.

We've thought a lot about (1) and (2). The JEP includes this example:

void test(List<?> l1, List<?> l2) {
    var l3 = l1; // List<CAP> or List<?>?
    l3 = l2; // error?
    l3.add(l3.get(0)); // error?
}

On 'l3 = l2': I wouldn't say it's an important priority that all 'var' 
variables have a type that is convenient for future mutation. But we do expect 
users do be able to easily see *why* an assignment wouldn't be allowed. 
Unfortunately, capture variables are such a subtle thing that they're often 
invisible, and programmers don't even realize that they appear as an 
intermediate step. So, most people would see 'var l3 = l1' and expect that the 
type of l3 is List<?>.

On 'l3.add(l3.get(0))': This is a cool trick. The use of 'var' essentially 
serves the same purpose as invoking a generic method in order to give a capture 
variable a name:

<T> dupFirst(List<T> list) { list.add(list.get(0)); }
...
dupFirst(l1);

On the other hand, it's a subtle trick, and the average user isn't going to 
understand what's going on. (Or, more likely: 'l3.add(l3.get(0))' looks fine to 
them, but they won't understand why it stops working when that gets refactored 
to 'l1.add(l1.get(0))'.)

So, in terms of user experience, it seems like (2) is the desired outcome here.

That choice isn't without some sacrifice: it would be a nice property if 
lifting a subexpression out of an expression into its own 'var' declaration 
yields identical types. Since (2) changes the intermediate type, that doesn't 
hold. That said, hopefully our mapping function is reasonably unobtrusive...

How do we define the mapping? "Use the bound" is the easy answer, although in 
practice it's more complicated than that:
- Which bound? (upper or lower?)
- What if the bound contains the capture var?
- What do you do with a capture variable appearing as a (invariant) type 
argument?
- What do you do with a capture variable appearing as a wildcard bound?

We're working on finalizing the details. While this operation isn't trivial, it 
turns out it's pretty important: we already need it to solve bugs in the type 
system involving type inference [1] and lambda expressions [2]. It's a useful 
general-purpose tool.

—Dan

[1] https://bugs.openjdk.java.net/browse/JDK-8016196
[2] https://bugs.openjdk.java.net/browse/JDK-8170887

[lvti] Handling of capture variables

Reply via email to