Questions about match.pd

Tamar Christina Fri, 27 Jan 2017 07:02:00 -0800

Hi All,

I am trying to understand how match.pd works as I'm writing a simple matching 
rule but have run into some issues
and there's very little documented on match.pd.


short version:

1) Why is there a difference in expressiveness between the GIMPLE and the 
GENERIC
   versions of match.pd. Particularly why does GENERIC has the limitation that 
it must
   be an internal function while GIMPLE is happy to just do the substitution.

2) Why doesn't the GIMPLE pass match on the GIMPLE code produced by the Fortran 
version?
   I have created an example where the GIMPLE trees of the two match exactly % 
some attributes
   on the expressions.

3) It seems to be either or, you can't have both rules? e.g. I either match in 
GENERIC or GIMPLE
   but not both? Is this correct? Right now if some other optimization makes it 
produce x*cos(x) then
   this will never be matched again? Since the GIMPLE passes don't seem to like 
the internal builtin?

4) There seems to be a bug with matching in GENERIC pass and when type 
conversions are being done. I
   am not sure who is wrong though. In the TREE the type has been correctly 
casted but it results in a segfault.

Thanks,
Tamar

----

For the long version and to see my current understanding of match.pd and what I 
have done:

Let’s say I want to transform occurrences of `x*cos(x)` with a super-dooper 
optimized builtin `foo`.
So `x*cos(x) => foo(x)`.

I start off at the obvious place, `match.pd`.

Since I need a new builtin I have to first define it in `builtins.def`. Since 
I’m replacing a math function I want optimized versions for `float`, `double`, 
etc.

DEF_GCC_BUILTIN        (BUILT_IN_FOO, "foo", BT_FN_DOUBLE_DOUBLE_DOUBLE, 
ATTR_CONST_NOTHROW_LEAF_LIST)
DEF_GCC_BUILTIN        (BUILT_IN_FOOF, "foof", BT_FN_FLOAT_FLOAT_FLOAT, 
ATTR_CONST_NOTHROW_LEAF_LIST)
DEF_GCC_BUILTIN        (BUILT_IN_FOOL, "fool", 
BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
#define FOO_TYPE(F) BT_FN_##F##_##F##_##F
DEF_GCC_FLOATN_NX_BUILTINS (BUILT_IN_FOO, "foo", FOO_TYPE, 
ATTR_CONST_NOTHROW_LEAF_LIST)
#undef FOO_TYPE

So now I have the builtins defined (I need to also define the proper handling 
code in builtins.c but that’s unrelated to the question) and can define the 
match rule:

/* x*cos(x) -> foo(x).  */
(for coss (COS)
 (simplify
  (mult:c (coss:s @0) @1)
  ...
 )
)

And the question is what goes into the … block.

I have builtins defined for each of the functions, so one implementation could 
be:

/* x*cos(x) -> foo(x).  */
(for coss (COS)
 (simplify
  (mult:c (coss:s @0) @1)
  (switch
    (if (types_match (type, float_type_node))
     (BUILT_IN_FOOF @1 @0))
    (if (types_match (type, double_type_node))
     (BUILT_IN_FOO @1 @0))
    (if (types_match (type, long_double_type_node))
     (BUILT_IN_FOOL @1 @0))
  )
 )
)

But one of the artifacts of defining the builtin in `builtins.def` is that an 
iterator is created for you for use in `match.pd` by `gencfn-macros.c`, in this 
case I have something like:

     (define_operator_list FOO
         BUILT_IN_FOOF
         BUILT_IN_FOO
         BUILT_IN_FOOL
         null)

The last `null` is important but I'll get to that later. These are generated 
and placed in `cfn-operators.pd` which is
included by `match.pd`.

The other artifact is a macro, for use in case statements in e.g. builtin.c, 
e.g. `CASE_CFN_FOO`
Which expands to:

      case CFN_BUILT_IN_FOOF:
      case CFN_BUILT_IN_FOOT:
      case CFN_BUILT_IN_FOOL:
      case CFN_FOO:

These are handy, because it means we can easily write match statements to the 
correct type. The iterator items are generated in increasing order of width of 
the type. E.g. float, double, long double.

So by iterating over both `COS` and the `FOO` iterator at the same time, we can 
simplify the the match rule:

/* x*cos(x) -> foo(x).  */
(for coss (COS)
     foos (FOO)
 (simplify
  (mult:c (coss:s @0) @1)
  (foos @0)
 )
)

This rule gets compiled into two files `generic-match.c` and `gimple-match.c`.
The first working on GENERIC tree and the second ofcourse on GIMPLE trees.

Given that GIMPLE and GENERIC have slightly different semantics the code for 
these
two are fairly different. The placement of the rule in `match.pd` seems to be 
important,
the rules are applied in order of their specification in `match.pd`.

The GIMPLE matcher will generate statements such as

            switch (gimple_call_combined_fn (def))
              {
              case CFN_BUILT_IN_COSF:
                {
                  tree o20 = gimple_call_arg (def, 0);
                  if ((o20 = do_valueize (valueize, o20)))
                    {
                      {
                       ...
                        *res_code = CFN_BUILT_IN_FOOF;
                        res_ops[0] = captures[1];
                        gimple_resimplify1 (lseq, res_code, type, res_ops, 
valueize);
                        return true;
                      }
                    }
                  break;
                }
              case CFN_BUILT_IN_COSS:
                {
                  tree o20 = gimple_call_arg (def, 0);
                  if ((o20 = do_valueize (valueize, o20)))
                    {
                      {
                       ...
                        *res_code = CFN_BUILT_IN_FOO;
                        res_ops[0] = captures[1];
                        gimple_resimplify1 (lseq, res_code, type, res_ops, 
valueize);
                        return true;
                      }
                    }
                  break;
                }
              ...
              default:;
              }
        }

Because of the order of the iterations, the correct function is matched. e.g. 
COSF -> FOOF,
COS -> FOO, etc.

The GENERIC version has the same global structure but differs vastly in how 
replacements are done:

   case CALL_EXPR:
      switch (get_call_combined_fn (op0))
        {
        case CFN_BUILT_IN_COSF:
          {
            tree o20 = CALL_EXPR_ARG (op0, 0);
            {
             ...
              res = maybe_build_call_expr_loc (loc, CFN_BUILT_IN_FOOF, type, 1, 
res_op0);
              if (!res)
                return NULL_TREE;
              if (TREE_SIDE_EFFECTS (captures[2]))
                res = build2_loc (loc, COMPOUND_EXPR, type, fold_ignored_result 
(captures[2]), res);
              return res;
            }
            break;
          }
        case CFN_BUILT_IN_COSS:
        ...

The important function here is `maybe_build_call_expr_loc`, again, I'll get to 
that later.

Testing it out, given a C input file:

#include <math.h>

float do_cool_stuff(float x)
{
   return x*cos(x);
}

we get:

do_cool_stuff (float x)
{
  float D.4154;

  _1 = (double) x;
  _2 = __builtin_foo (_1);
  D.4154 = (float) _2;
  return D.4154;
}

The rule seems to be matching early in the GIMPLE phase and not the generic, 
fair enough as long as it works.
It also correctly adds type casting and calling the double version of `foo`.

Fortran however disagrees:

For my Fortran function

 function func()
    real :: j ! output
    j = j * cos(j)
 end function func

I get

func ()
{
  real(kind=4) j;

  _1 = __builtin_cosf (j);
  j = j * _1;
}

So fortran is not applying the rule on neither GENERIC nor GIMPLE.

I am not sure why the GIMPLE one does not apply, but the GENERIC one doesn't 
apply
for neither Fortran nor C because of an implicit constrains on 
`generic-match.c` which comes
from `maybe_build_call_expr_loc`.

This function will only allow the substitution if the function to be 
substituted to is an internal builtin.
Presumably because it wants to prevent introducing a function that it cannot 
fold or lower later. As this might
introduce a link error later if it can't remove it.

Fair enough, the way to tell it that it can do get rid of the function is by 
adding it to `internal-fn.def`.

A comment in the file states that:

   DEF_INTERNAL_FLT_FN is like DEF_INTERNAL_OPTAB_FN, but in addition,
   the function implements the computational part of a built-in math
   function BUILT_IN_<NAME>{F,,L}.

Which is exactly what I need.

So to `internal-fn.def` I add a new entry

DEF_INTERNAL_FLT_FN (FOO, ECF_CONST, foo, unary)

Which now in order for it to work requires the optab `foo` to be defined. So in 
`optabs.def`
I add:

OPTAB_D (foo_optab, "foo$F$a2")

As the comment suggests I expected a mapping to be done between the typed 
functions.
e.g. I expected there to be 3 new internal functions, one for float, double and 
long double.

However only one new builtin is generated `IFN_FOO`. And this is added in the 
iterator for FOO
in place of the `(null)` we saw earlier:

     (define_operator_list FOO
         BUILT_IN_FOOF
         BUILT_IN_FOO
         BUILT_IN_FOOL
         IFN_FOO)

Re-running the fortran example it still does not match. This is because the 
`cos` is
expanded into `cosf` and then matches against `BUILT_IN_FOOF` and not the 
internal function.
So generic-match.c still refuses to match it.

The only way I have found to make it match is to change the matching rule to:

/* x*cos(x) -> foo(x).  */
(for coss (COS)
 (simplify
  (mult:c (coss:s @0) @1)
  (IFN_FOO @0)
 )
)

e.g. Map all the rules to the type generic IFN_FOO.

Running the example now still produces nothing, this is because if I understood 
correctly the rewrite
is only done *if* an optab for the required type actually exists. So I create 
one in the backend.

I create an obtab for both double and float types (doing so in the aarch64 
backend):

(define_insn "foo<mode>2"
  [(set (match_operand:GPF_F16 0 "register_operand" "=w")
        (abs:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))]
  "TARGET_FLOAT"
  "fabs\\t%<s>0, %<s>1"
  [(set_attr "type" "ffarith<stype>")]
)

The operation is non-sense but that doesn't matter, we just care about the 
gimple:

func ()
{
  real(kind=4) j;

  j = FOO (j);
}


So the fortan code has correctly replaced `x*cos(x)` with `FOO`.

C however now crashes with a segfault:

cos.c: In function ‘do_cool_stuff’:
cos.c:5:4: internal compiler error: Segmentation fault
    return x*cos(x);
    ^~~~~~
0xb55ddf crash_signal
        /work/tamchr01/gnu-work/src/gcc/gcc/toplev.c:333
0x6f227c builtin_mathfn_code(tree_node const*)
        /work/tamchr01/gnu-work/src/gcc/gcc/builtins.c:7532
0x76070e convert_to_real_1

This happens

  if (TREE_CODE (t) != CALL_EXPR
      || TREE_CODE (CALL_EXPR_FN (t)) != ADDR_EXPR)

here because `CALL_EXPR_FN (t)` returns NULL. This is because one of the 
operators
in the operators list is null. TREE_CODE just blindly dereferences it. This 
seems to
be a bug, however I am not sure where. If the bug is in `builtin_mathfn_code` 
or in the
code that generated the tree in the first place. Note that this does not happen 
if
type casts are not being done.

If I change

float do_cool_stuff(float x)
{
   return x*cos(x);
}

to

float do_cool_stuff(float x)
{
   return x*cosf(x);
}

The type error can be fixed by adding an extra check:

  if (TREE_CODE (t) != CALL_EXPR
      || (CALL_EXPR_FN (t) && TREE_CODE (CALL_EXPR_FN (t)) != ADDR_EXPR))

to `builtin_mathfn_code` and this seems to work fine and I think is correct, as 
this function is allowed to say NO.

Questions about match.pd

Reply via email to