[jira] [Comment Edited] (GROOVY-5490) Double Dispatch Distance Algorithm Incorrectly Implemented

Jochen Theodorou (Jira) Sat, 08 Jan 2022 03:24:07 -0800


    [ 
https://issues.apache.org/jira/browse/GROOVY-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471102#comment-17471102
 ]


Jochen Theodorou edited comment on GROOVY-5490 at 1/8/22, 11:23 AM:
--------------------------------------------------------------------

Matching the first argument more than the later arguments is not what the 
algorithm used contains or was intended to contain.

Let me explain the ideas behind the current algorithm.... this is gonna be 
long...

let us say that we use 1 as distance (d) if one class is a direct subclass, 
then d(Integer,Object)==2, because Integer is a subclass of Number. 
d(String,Object)=1, because it is a direct subclass of Object. For foo(Object, 
Integer) with the arguments String and Integer we get 
d(String,Object)+d(Integer,Integer)==1. For foo(String,Object) with the same 
arguments String and Integer we get d(String,String)+d(Integer,Object)=2. This 
means foo(Object,Integer) should be selected according to the basics of the 
idea.

Let's alter the version above and say whenever we have to transform a type 
instance we add 1 to the distance, while a sub-classing distance is then 2. 
This means d'(int,Integer) = 1; d'(Integer,Object)=4; d'(int,Object) = 5. Since 
foo("potato", new Integer(6)) is wrongly seen as foo("potato", 6) we get 
d'(String,Object)+d'(int,Integer)==3 for foo(Object,Integer); and for 
foo(String,Object) we get d'(String,String)+d'(int,Object)=5. This would still 
select foo(Object,Integer) correctly. If foo("potato", new Integer(6)) is seen 
as foo(String,Integer) we get for foo(Object,Integer) a d'==2 and for 
foo(String,Object) a d'==4

Since there is an overlap between the distances for type transformations and 
subclasses if multiple arguments are involved one simple way to solve this 
would be to use an area just for the transformations and one for subclasses.. 
For example subclass_distances*256+primitive_distances. Which means 
d''(Integer,Object)==2*256; d''(int,Object)==2*256+1; d(String,Object)=256: Now 
calling foo(Object,Integer) with (String,int) makes a distance of 
d''(String,Object)+d''(int,Integer)==1*256+1==257. For foo(String,Object) with 
String int we get d''(String,String)+d''(int,Object)==0+2*256+1=1025. Again 
foo(Object,Integer) should be used. This implies primitive type transformations 
are "better" than subclass relations.

What if we reverse that and say type transformations should be *256 and 
subclasses stay as they are with a distance of 1: For foo(String,Object) we get 
now d'''(String,String)+d'''(Integer,Object)==4 or if the Integer is seen as 
int: 260. And foo(Object,Integer) is now 
d'''(String,Object)+d'''(Integer,Integer)==2 and with int it is 258.

Frankly it does not matter that much which one gets the multiplier, doing it 
with the type transformation is usually better, since multiplying with this 
factor means there is a maximum and type transformations are quite limited, 
while subclassing can go long ways (in theory). Of course there is more... 
there is the case of GString, that is handled as subclass of String. There are 
widening and narrowing to consider,  then there are transformations like 
calling with String for Char and varargs and null and....

Now the problem is that the distance algorithm implemented was intended to work 
like that, but does not. Because of widening a there can be a [distance of 15 
between numbers and 
Object|https://github.com/groovy/groovy-core/blob/01309f9d4be34ddf93c4a9943b5a97843bff6181/src/main/org/codehaus/groovy/runtime/MetaClassHelper.java#L213].
  This distance pd is then transformed with pd<<21. The multiplier for Object 
distances (od) is calculated by <<23. In this algorithm Long and Integer are 
seen as primitives. The object distance then starts with 17 (16 primitives 
between first subclass and Object). This means we basically get dg = Sum od + 
Sum pd, od = (17 + d)<<23 iff d>0 or 0 else and pd=tableValue<<21. For 
foo(String,Object) we get for a call with (String,Integer) 
dg=od(String,String)+pd(Integer,Object) == (0)<<23+(11)<<21 == 23068672 and for 
foo(Object,Integer) we get dg=od(String,Object)+pd(Integer,Integer) ==  
(17+3)<<23+0 == 167772160. And now suddenly foo(String,Object) is selected, 
even though, in the simplified versions above it was always 
foo(Object,Integer). And while my second number does not fit with the one you 
gave it should still not make it wrong.

The big difference is that in my examples leading to this we where always 
talking about object distances and type transformations, while here it does a 
mix of both, seeing primitives as something between Object and a first real 
subclass. That lead to String having a bigger difference to Object than for 
example Integer, even though Integer is a subclass of Number and should have a 
bigger distance. 

The choice seems strange, but let me try to explain the background ... Let us 
look at foo(Object), foo(int), foo(Integer) and a call with (Integer). 
Obviously we want to call foo(Integer). What now if we do the call with 1G. In 
Java BigInteger is a class and has nothing to do with Integer, so obviously it 
would be foo(Object). But in Groovy BigInteger is considered a primitive 
(Number too actually). According to the table I mentioned the distance here is 
now 4 to foo(int), 6 to foo(Integer) and 2 to Object. So we still select 
foo(Object). If it would not be there we would select foo(int).  Now assume we 
have foo(Integer) and foo(int) and a call foo(1). Here Groovy looks at foo(1) 
as a call with Integer, thus we would match foo(Integer). The correct match 
though would be foo(int) instead - that is unless we do not use reflection to 
get the runtime type of the argument boxed objects and instead use the unboxed 
values - invokedynamic can/could do that. If we have foo(Number) and 
foo(Object) and call with int/Integer, we of course still want to be like Java 
and call foo(Number). Going through all the cases basically is what produced 
that table in line 213 and implies special handling of Number.

I think the bug is related to the idea, that if X is not a primitive, we can 
just look at its super classes and then be done. I think what would be more 
correct would be to repeat the primitives test for super class as well (making 
the algorithm less efficient of course). In that case we could think about 
forgetting the special object multiplier and get something like this{code:Java} 
long d(Class argument, Class parameter) {
  long distance == 0
  clazz = argument
  while (clazz != null) {
     if (clazz==parameter) return distance;
     long distance = getPrimitiveDistance(clazz,parameter)
     if (distance>=0) break
     distance +=3
     clazz = clazz.superClass
  }
  if (getPrimitiveIndex(argument) == -1) distance+=16
  return distance
}{code}
Of course this is without the varargs, null and without the interfaces part, so 
I would still shift by 21 and do the other logic.... and this would still be 
not solving the problem. I does not solve it because the distance between 
Integer and Object is still only 11, while it is 17 for String. If a super 
class hop means +17, then Integer to Object should be 34. Since 16 is not used, 
let us use 16 for the class hop since it shifts better. Then the primitive 
table for Integer would look like {12, 13, 10, 11, 1, 0, 2, 3, 4, 5, 6, 7, 8, 
9, 16, 32,} and the tweaked algorithm would be:
{code:Java} 
long d(Class argument, Class parameter) {
  long distance == 0
  clazz = argument
  while (clazz != null) {
     if (clazz==parameter) return distance;
     long distance = getPrimitiveDistance(clazz,parameter)
     if (distance>=0) break
     distance +=16
     clazz = clazz.superClass
  }
  return distance
}{code}
Now the call (String,Integer) on foo(Object,Integer) would produce 16+0 and on 
foo(String,Object) it would produce 0+32 and foo(Object,Integer) would be 
selected again. Not only that, but it would also change a lot for sub classes 
of Number/BigDecimal/BigInteger in general, plus now x(1G) on x(Integer) and 
x(Object) would select suddenly x(Integer) unless the table is further changed.

Overall adapting that kind of change brings a lot of testing for special cases.


was (Author: blackdrag):
Matching the first argument more than the later arguments is not what the 
algorithm used contains or was intended to contain.

Let me explain the ideas behind the current algorithm.... this is gonna be 
long...

let us say that we use 1 as distance (d) if one class is a direct subclass, 
then d(Integer,Object)==2, because Integer is a subclass of Number. 
d(String,Object)=1, because it is a direct subclass of Object. For foo(Object, 
Integer) with the arguments String and Integer we get 
d(String,Object)+d(Integer,Integer)==1. For foo(String,Object) with the same 
arguments String and Integer we get d(String,String)+d(Integer,Object)=2. This 
means foo(Object,Integer) should be selected according to the basics of the 
idea.

Let's alter the version above and say whenever we have to transform a type 
instance we add 1 to the distance, while a sub-classing distance is then 2. 
This means d'(int,Integer) = 1; d'(Integer,Object)=4; d'(int,Object) = 5. Since 
foo("potato", new Integer(6)) is wrongly seen as foo("potato", 6) we get 
d'(String,Object)+d'(int,Integer)==3 for foo(Object,Integer); and for 
foo(String,Object) we get d'(String,String)+d'(int,Object)=5. This would still 
select foo(Object,Integer) correctly. If foo("potato", new Integer(6)) is seen 
as foo(String,Integer) we get for foo(Object,Integer) a d'==2 and for 
foo(String,Object) a d'==4

Since there is an overlap between the distances for type transformations and 
subclasses if multiple arguments are involved one simple way to solve this 
would be to use an area just for the transformations and one for subclasses.. 
For example subclass_distances*256+primitive_distances. Which means 
d''(Integer,Object)==2*256; d''(int,Object)==2*256+1; d(String,Object)=256: Now 
calling foo(Object,Integer) with (String,int) makes a distance of 
d''(String,Object)+d''(int,Integer)==1*256+1==257. For foo(String,Object) with 
String int we get d''(String,String)+d''(int,Object)==0+2*256+1=1025. Again 
foo(Object,Integer) should be used. This implies primitive type transformations 
are "better" than subclass relations.

What if we reverse that and say type transformations should be *256 and 
subclasses stay as they are with a distance of 1: For foo(String,Object) we get 
now d'''(String,String)+d'''(Integer,Object)==4 or if the Integer is seen as 
int: 260. And foo(Object,Integer) is now 
d'''(String,Object)+d'''(Integer,Integer)==2 and with int it is 258.

Frankly it does not matter that much which one gets the multiplier, doing it 
with the type transformation is usually better, since multiplying with this 
factor means there is a maximum and type transformations are quite limited, 
while subclassing can go long ways (in theory). Of course there is more... 
there is the case of GString, that is handled as subclass of String. There are 
widening and narrowing to consider,  then there are transformations like 
calling with String for Char and varargs and null and....

Now the problem is that the distance algorithm implemented was intended to work 
like that, but does not. Because of widening a there can be a [distance of 15 
between numbers and 
Object|https://github.com/groovy/groovy-core/blob/01309f9d4be34ddf93c4a9943b5a97843bff6181/src/main/org/codehaus/groovy/runtime/MetaClassHelper.java#L213].
  This distance pd is then transformed with pd<<21. The multiplier for Object 
distances (od) is calculated by <<23. In this algorithm Long and Integer are 
seen as primitives. The object distance then starts with 17 (16 primitives 
between first subclass and Object). This means we basically get dg = Sum od + 
Sum pd, od = (17 + d)<<23 iff d>0 or 0 else and pd=tableValue<<21. For 
foo(String,Object) we get for a call with (String,Integer) 
dg=od(String,String)+pd(Integer,Object) == (0)<<23+(11)<<21 == 23068672 and for 
foo(Object,Integer) we get dg=od(String,Object)+pd(Integer,Integer) ==  
(17+3)<<23+0 == 167772160. And now suddenly foo(String,Object) is selected, 
even though, in the simplified versions above it was always 
foo(Object,Integer). And while my second number does not fit with the one you 
gave it should still not make it wrong.

The big difference is that in my examples leading to this we where always 
talking about object distances and type transformations, while here it does a 
mix of both, seeing primitives as something between Object and a first real 
subclass. That lead to String having a bigger difference to Object than for 
example Integer, even though Integer is a subclass of Number and should have a 
bigger distance. 

The choice seems strange, but let me try to explain the background ... Let us 
look at foo(Object), foo(int), foo(Integer) and a call with (Integer). 
Obviously we want to call foo(Integer). What now if we do the call with 1G. In 
Java BigInteger is a class and has nothing to do with Integer, so obviously it 
would be foo(Object). But in Groovy BigInteger is considered a primitive 
(Number too actually). According to the table I mentioned the distance here is 
now 4 to foo(int), 6 to foo(Integer) and 2 to Object. So we still select 
foo(Object). If it would not be there we would select foo(int).  Now assume we 
have foo(Integer) and foo(int) and a call foo(1). Here Groovy looks at foo(1) 
as a call with Integer, thus we would match foo(Integer). The correct match 
though would be foo(int) instead - that is unless we do not use reflection to 
get the runtime type of the argument boxed objects and instead use the unboxed 
values - invokedynamic can/could do that. If we have foo(Number) and 
foo(Object) and call with int/Integer, we of course still want to be like Java 
and call foo(Number). Going through all the cases basically is what produced 
that table in line 213 and implies special handling of Number.

I think the bug is related to the idea, that if X is not a primitive, we can 
just look at its super classes and then be done. I think what would be more 
correct would be to repeat the primitives test for super class as well (making 
the algorithm less efficient of course). In that case we could think about 
forgetting the special object multiplier and get something like this{Java:code} 
long d(Class argument, Class parameter) {
  long distance == 0
  clazz = argument
  while (clazz != null) {
     if (clazz==parameter) return distance;
     long distance = getPrimitiveDistance(clazz,parameter)
     if (distance>=0) break
     distance +=3
     clazz = clazz.superClass
  }
  if (getPrimitiveIndex(argument) == -1) distance+=16
  return distance
}{code}
Of course this is without the varargs, null and without the interfaces part, so 
I would still shift by 21 and do the other logic.... and this would still be 
not solving the problem. I does not solve it because the distance between 
Integer and Object is still only 11, while it is 17 for String. If a super 
class hop means +17, then Integer to Object should be 34. Since 16 is not used, 
let us use 16 for the class hop since it shifts better. Then the primitive 
table for Integer would look like {12, 13, 10, 11, 1, 0, 2, 3, 4, 5, 6, 7, 8, 
9, 16, 32,} and the tweaked algorithm would be:
{Java:code} 
long d(Class argument, Class parameter) {
  long distance == 0
  clazz = argument
  while (clazz != null) {
     if (clazz==parameter) return distance;
     long distance = getPrimitiveDistance(clazz,parameter)
     if (distance>=0) break
     distance +=16
     clazz = clazz.superClass
  }
  return distance
}{code}
Now the call (String,Integer) on foo(Object,Integer) would produce 16+0 and on 
foo(String,Object) it would produce 0+32 and foo(Object,Integer) would be 
selected again. Not only that, but it would also change a lot for sub classes 
of Number/BigDecimal/BigInteger in general, plus now x(1G) on x(Integer) and 
x(Object) would select suddenly x(Integer) unless the table is further changed.

Overall adapting that kind of change brings a lot of testing for special cases.

> Double Dispatch Distance Algorithm Incorrectly Implemented
> ----------------------------------------------------------
>
>                 Key: GROOVY-5490
>                 URL: https://issues.apache.org/jira/browse/GROOVY-5490
>             Project: Groovy
>          Issue Type: Bug
>          Components: groovy-runtime
>    Affects Versions: 1.8.4, 2.4.3
>            Reporter: thurston n
>            Priority: Major
>
> The following script 
> {code:borderStyle=solid}
>    1.  static foo(Object o, Integer i) { "Integer won" }
> 2.  static foo(String s, Object o) { "String won" }
> assert foo("potato", new Integer(6)) =~ "Integer"
> {code}
> fails.
> According to J. Theodorou, method resolution in Groovy employs a minimum 
> _distance_ algorithm in determining which concrete method to invoke at 
> runtime.
> The _distance_ of a single method parameter being computed as:
> distance(Class runtime, Class declaredParameter) ==> difference in level in 
> inheritance tree, e.g. distance(String, Object) == 1 - 0 == 1, etc.
> therefore the total distance is just distance(param1) + distance(param2) + ...
> In the sample code above, the runtime distance of foo #1 would be 
> (distance(String, Object) + distance(Integer, Integer)) == 1 + 0 == 1
> while distance of foo #2 would be (distance(String, String) + 
> distance(Integer, Object)) == 0 + 2 == 2 (since Integer -> Number -> Object)
> Therefore foo #1 should have been selected
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (GROOVY-5490) Double Dispatch Distance Algorithm Incorrectly Implemented

Reply via email to