Pattern Matching

Remi Forax Sat, 18 Mar 2017 10:22:35 -0700

Hi guys,
i've already implemented a kind of pattern matching in a language that run on 
the JVM (still closed source :( ),
so i've said to Brian that i will send a summary of how it is done.


please do not care about the syntax, it's just for explanation :)

regards,
Rémi

---
Pattern Matching

1. expression problem

  polymorphism:
    interface I { String m(); }
    class A implements I {
       String m() { return "A::m"; }
    }
    class B implements I {
       String m() { return "B::m"; }
    }

  pattern matching:
    static String m(I i) {
       switch(i) {
         case A a:
           return "A::m";
         case B b:
           return "B::m";
         default:
           throw new AssertionError();
       } 
    }

  
  polymorphism => easy to add a new subtype of I, can not add a new operation
  pattern matching => easy to add a new opertion, can not add a new subtype

  said differently, if the hierarchy is open => use polymorphism, if the 
hierarchy is closed => use pattern matching 

  The expression problem, Philip Wadler => you can not have both and type 
safety.

  Other ways:
    - The visitor pattern uses the double dispatch technique to implement the 
pattern matching,
      the visitor interface has to list all the methods so it's equivalent to 
closed the hierarchy.

    - A hashmap of lambda [1], extensible in both direction but it required an 
unsafe cast.

    - multi dispatch like in CLOS, Dylan or Clojure, not typesafe.


2. smart cast / flow type inference

  class A {
    int value;
  }
  class B {
    String s;
  }

  Introducing a new name/local variable in the case allows to avoid unnecessary 
casts,
  so instead of

    switch(val) {
      case A:
        return ((A)a).value;
      case B:
        return ((B)b).s.length();
    }

  one can write:

    switch(val) {
      case A a:
        return a.value;
      case B b:
        return b.s.length();
    }

  the other solution is to specify a kind of flow inference, i.e. inside the 
case A, val is now typed as 'A', and inside of case B, val is now typed as 'B'. 

    switch(val) {
      case A:   // here val is now a A
        return val.value;
      case B:   // here val is now a B
        return val.s.length();
    }

   This doesn't work well if you can fallthough from case A to case B (more on 
this later) and in term of implementation, it's better for the debugger
   to create a new variable inside the case, so the generated code is more like:

   switch(val) {
      case A:
        A $val = (A)val;
        return $val.value;
      case B:
        B $val = (B)val;
        return $val.s.length();
    }
  

2. block vs expression

  When switching on type, being able to fallthrough from a case to another is 
less useful, because it's only safe if the type of the followup case is a super 
type.

    switch(val) {
       case Foo:
       case Bar:   // here Bar as to be a supertype of Foo.
    }

  Moreover, because of the separate compilation, the relationship between Foo 
and Bar can change after the code that containing the switch is compiled,
  do Bar being a supertype of Foo as to be verified (once) at runtime or by the 
bytecode verifier.

  So instead of using a C like switch based on statements, it's better a 
construction based on expression like the match in Scala or the when in Kotlin,
  something like
    match(val) {
      case A a -> a.value;
      case B b -> b.s.length();
    }


3. generalized cases

  Instead of a switch on type, we can go a step further and allow any arbitrary 
tests, like by example allowing comparison by value,
  by example in Java, because unlike in Scala there is no hierachy to 
distinguish if an optional is present or not,
  with a match that allows comparison on value, we can write
  
    match(opt) {
      case Optional.empty() -> "empty"
      case Optional opt -> opt.get()
    }

  One problem with this construct is that unlike the match on type, here, there 
is an implicit order between the cases, e.g. at runtime the cases
  as to be tested in the order like a cascade of if/else. From the performance 
point of view, the drawback is that if you have a big hierachy,
  and want to do a switch on type, having a linear test is slow. From my own 
experience, it's a serious issue when you try to use a switch on type
  with that perf property in a compiler. 

  A solution i've used in one language is to force users to define all the case 
on values first and then the case on types,
  at runtime, the case on values were executed with a cascade of if/else and 
the cases on types were executed using a if/else or an hashmap
  depending on the number of cases.


4. structural matching
  
  One problem of the pattern matching, is that because it's not expressed 
inside a hierarchy, it ruins the encapsulation, so either your user
  has to write getters or you have to expose the structure of the class to the 
pattern matching.
  Case class in Scala or record in C# 7 [2] are special constructs that expose 
the structure of a class, so the case of a match can specified in term
  of destructured local variable 

    interface I { }
    case class A(int value) implements I;
    case class B(String s, String s2) implements I;

    match(i) {
      case A(val) -> val
      case B(s, _) -> s
    }

  Both C# and Scala allow user to define how the matching is done, C# relies on 
the is operator which uses out parameters to extract the values
  and Scala uses an extractor method (unapply) which uses tuples to extract the 
values. Java has none of this mechanism, so either we introduce
  a mechanism that provides a structural definition + getters or we wait the 
introduction of value types (and tuples).

  A simple proposal for a structural definition of a class:

    structural class B(final String s, final String s2) implements I;

  which is equivalent to
    /*structural*/ class B {
      private final String s;
      private final String s2;

      public B(String s, String s2) {
        this.s = s;
        this.s2 = s2;
      }

      public String getS() { return s; }
      public String getS2() { return s2; }

      + equals and hashCode

      + a StructuralAttribute that describe the attributes s and s2.
    }

  if a body is declared, then nothing is generated by the compiler apart the 
StructuralAttribute
  and it's an error to not provide the getters, equals and hashCode. 

    structural class B(final String s, final String s2) implements I {
       // need a getS and getS2 and equals and hashCode
    }

  Again, it's maybe better to wait value types instead of relying on getters.


5. Compilation of a pattern matching

  As said previously, a switch on types should not be compiled to a cascade of 
if/else.
  - It's dog slow if you have more than a dozen of cases, because it's a linear 
scan and each instanceof is in the best case also translated to an if/else by 
the JIT
  - It doesn't work well with separate compilation, because if/else cascade is 
a sequence so defines an order.

  The solution is to use invokedynamic, obviously :)

  The recipe of a pattern matching can be describe as an array of couple of 
pattern/action,
  the pattern is:
    a test to a constant (static final field)
    a test to an argument, argument are like captured argument of a lambda pass 
as argument of invokedynamic
    a test of a type (a class)
    a destructured definition (a string encoding the matching tree indicating 
the extracted values)
  an action is:
    a constant method handle that reference a static method that takes as 
parameter the local variable defines by the pattern.

  Note: in my language, i've encoded the receipe as a JSON like string 
uuencoded seen as a constant String because there is no way to definition a 
tree like constant in the bytecode (yet !).
  
  Examples:
    match(opt) {
      case Optional.empty() -> "empty"
      case Optional opt -> opt.get()
    }
   can be encoded as
     invokedynamic(opt)  (Ljava/util/Optional;)Ljava/lang/String;

     with the recipe:
       constant: method: Ljava/lang/Optional;.empty()Ljava/lang/Optional;
       type: Optional.class

  and
    match(i) {
      case A(val) -> val
      case B(s, _) -> s
    }
   can be encoded as
     invokedynamic(i)  (LI;)Ljava/lang/Object;

     with the receipe:
       structural: LA;(?)
       structural: LB;(?_)
   
     (where ? means capture and _ forget)


  At runtime, in the bootstrap method, the type of the constant, the type of 
the argument, the type of the test type and the type of the structural matching 
are checked to verify that
  they are subtype of the first parameter of invokedynamic, otherwise an Error 
is raised.

  The test to a constant and the test to an argument are created as 
guardWithTest eagerly. The test on types are created lazily when a new class is 
found, if more that one pattern match,
  an error is reported otherwise, a guardWithTest is created and if there are 
two many guardWithTest (12 currently), a ClassValue is used instead.

  GWT are inserted the one after the others, a first prototype was trying to 
profile the branch taken to re-balance the tree is necessary but sometimes c1 
was able to compile the code
  that was doing the profiling before the code that was doing the re-balance 
was called leading to a stupid code invalidation.
  A specific MethodHandle could do a better job here !


[1] 
https://github.com/forax/design-pattern-reloaded/blob/master/src/main/java/visitor/visitor6.java#L28
[2] 
https://github.com/dotnet/roslyn/blob/features/records/docs/features/records.md

Pattern Matching

Reply via email to