On Fri, Jan 27, 2017 at 4:00 PM, Tamar Christina
<tamar.christ...@arm.com> wrote:
> Hi All,
>
> I am trying to understand how match.pd works as I'm writing a simple matching 
> rule but have run into some issues
> and there's very little documented on match.pd.
>
> short version:
>
> 1) Why is there a difference in expressiveness between the GIMPLE and the 
> GENERIC
>    versions of match.pd. Particularly why does GENERIC has the limitation 
> that it must
>    be an internal function while GIMPLE is happy to just do the substitution.

I think you run into the limitation that during GENERIC we simply
don't know reliably which functions we may
emit calls to.

> 2) Why doesn't the GIMPLE pass match on the GIMPLE code produced by the 
> Fortran version?
>    I have created an example where the GIMPLE trees of the two match exactly 
> % some attributes
>    on the expressions.

Fortran (and other frontends besides C-family) do not get their
builtins from builtins.def but need to
provide their own version.  See fortran/mathbuiltins.def.

> 3) It seems to be either or, you can't have both rules? e.g. I either match 
> in GENERIC or GIMPLE
>    but not both? Is this correct? Right now if some other optimization makes 
> it produce x*cos(x) then
>    this will never be matched again? Since the GIMPLE passes don't seem to 
> like the internal builtin?

That's not correct.  You always match both unless you disable one or
the other explicitely.  GIMPLE passes
should use cfn_ throughout, otherwise it's a bug.

> 4) There seems to be a bug with matching in GENERIC pass and when type 
> conversions are being done. I
>    am not sure who is wrong though. In the TREE the type has been correctly 
> casted but it results in a segfault.
>
> Thanks,
> Tamar
>
> ----
>
> For the long version and to see my current understanding of match.pd and what 
> I have done:
>
> Let’s say I want to transform occurrences of `x*cos(x)` with a super-dooper 
> optimized builtin `foo`.
> So `x*cos(x) => foo(x)`.
>
> I start off at the obvious place, `match.pd`.
>
> Since I need a new builtin I have to first define it in `builtins.def`. Since 
> I’m replacing a math function I want optimized versions for `float`, 
> `double`, etc.
>
> DEF_GCC_BUILTIN        (BUILT_IN_FOO, "foo", BT_FN_DOUBLE_DOUBLE_DOUBLE, 
> ATTR_CONST_NOTHROW_LEAF_LIST)
> DEF_GCC_BUILTIN        (BUILT_IN_FOOF, "foof", BT_FN_FLOAT_FLOAT_FLOAT, 
> ATTR_CONST_NOTHROW_LEAF_LIST)
> DEF_GCC_BUILTIN        (BUILT_IN_FOOL, "fool", 
> BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
> #define FOO_TYPE(F) BT_FN_##F##_##F##_##F
> DEF_GCC_FLOATN_NX_BUILTINS (BUILT_IN_FOO, "foo", FOO_TYPE, 
> ATTR_CONST_NOTHROW_LEAF_LIST)
> #undef FOO_TYPE
>
> So now I have the builtins defined (I need to also define the proper handling 
> code in builtins.c but that’s unrelated to the question) and can define the 
> match rule:
>
> /* x*cos(x) -> foo(x).  */
> (for coss (COS)
>  (simplify
>   (mult:c (coss:s @0) @1)
>   ...
>  )
> )
>
> And the question is what goes into the … block.
>
> I have builtins defined for each of the functions, so one implementation 
> could be:
>
> /* x*cos(x) -> foo(x).  */
> (for coss (COS)
>  (simplify
>   (mult:c (coss:s @0) @1)
>   (switch
>     (if (types_match (type, float_type_node))
>      (BUILT_IN_FOOF @1 @0))
>     (if (types_match (type, double_type_node))
>      (BUILT_IN_FOO @1 @0))
>     (if (types_match (type, long_double_type_node))
>      (BUILT_IN_FOOL @1 @0))
>   )
>  )
> )
>
> But one of the artifacts of defining the builtin in `builtins.def` is that an 
> iterator is created for you for use in `match.pd` by `gencfn-macros.c`, in 
> this case I have something like:
>
>      (define_operator_list FOO
>          BUILT_IN_FOOF
>          BUILT_IN_FOO
>          BUILT_IN_FOOL
>          null)
>
> The last `null` is important but I'll get to that later. These are generated 
> and placed in `cfn-operators.pd` which is
> included by `match.pd`.
>
> The other artifact is a macro, for use in case statements in e.g. builtin.c, 
> e.g. `CASE_CFN_FOO`
> Which expands to:
>
>       case CFN_BUILT_IN_FOOF:
>       case CFN_BUILT_IN_FOOT:
>       case CFN_BUILT_IN_FOOL:
>       case CFN_FOO:
>
> These are handy, because it means we can easily write match statements to the 
> correct type. The iterator items are generated in increasing order of width 
> of the type. E.g. float, double, long double.
>
> So by iterating over both `COS` and the `FOO` iterator at the same time, we 
> can simplify the the match rule:
>
> /* x*cos(x) -> foo(x).  */
> (for coss (COS)
>      foos (FOO)
>  (simplify
>   (mult:c (coss:s @0) @1)
>   (foos @0)
>  )
> )

Looks good sofar.  But note that the FOOs are likely not !
builtin_decl_implicit and thus
we won't generate calls to them unless calls to them are already
present in the program
(this is determined during gimplification thus _after_ GENERIC).

> This rule gets compiled into two files `generic-match.c` and `gimple-match.c`.
> The first working on GENERIC tree and the second ofcourse on GIMPLE trees.
>
> Given that GIMPLE and GENERIC have slightly different semantics the code for 
> these
> two are fairly different. The placement of the rule in `match.pd` seems to be 
> important,
> the rules are applied in order of their specification in `match.pd`.
>
> The GIMPLE matcher will generate statements such as
>
>             switch (gimple_call_combined_fn (def))
>               {
>               case CFN_BUILT_IN_COSF:
>                 {
>                   tree o20 = gimple_call_arg (def, 0);
>                   if ((o20 = do_valueize (valueize, o20)))
>                     {
>                       {
>                        ...
>                         *res_code = CFN_BUILT_IN_FOOF;
>                         res_ops[0] = captures[1];
>                         gimple_resimplify1 (lseq, res_code, type, res_ops, 
> valueize);
>                         return true;
>                       }
>                     }
>                   break;
>                 }
>               case CFN_BUILT_IN_COSS:
>                 {
>                   tree o20 = gimple_call_arg (def, 0);
>                   if ((o20 = do_valueize (valueize, o20)))
>                     {
>                       {
>                        ...
>                         *res_code = CFN_BUILT_IN_FOO;
>                         res_ops[0] = captures[1];
>                         gimple_resimplify1 (lseq, res_code, type, res_ops, 
> valueize);
>                         return true;
>                       }
>                     }
>                   break;
>                 }
>               ...
>               default:;
>               }
>         }
>
> Because of the order of the iterations, the correct function is matched. e.g. 
> COSF -> FOOF,
> COS -> FOO, etc.
>
> The GENERIC version has the same global structure but differs vastly in how 
> replacements are done:
>
>    case CALL_EXPR:
>       switch (get_call_combined_fn (op0))
>         {
>         case CFN_BUILT_IN_COSF:
>           {
>             tree o20 = CALL_EXPR_ARG (op0, 0);
>             {
>              ...
>               res = maybe_build_call_expr_loc (loc, CFN_BUILT_IN_FOOF, type, 
> 1, res_op0);
>               if (!res)
>                 return NULL_TREE;
>               if (TREE_SIDE_EFFECTS (captures[2]))
>                 res = build2_loc (loc, COMPOUND_EXPR, type, 
> fold_ignored_result (captures[2]), res);
>               return res;
>             }
>             break;
>           }
>         case CFN_BUILT_IN_COSS:
>         ...
>
> The important function here is `maybe_build_call_expr_loc`, again, I'll get 
> to that later.
>
> Testing it out, given a C input file:
>
> #include <math.h>
>
> float do_cool_stuff(float x)
> {
>    return x*cos(x);
> }
>
> we get:
>
> do_cool_stuff (float x)
> {
>   float D.4154;
>
>   _1 = (double) x;
>   _2 = __builtin_foo (_1);
>   D.4154 = (float) _2;
>   return D.4154;
> }
>
> The rule seems to be matching early in the GIMPLE phase and not the generic, 
> fair enough as long as it works.

Btw, why are you interested in GENERIC matching at all here?

> It also correctly adds type casting and calling the double version of `foo`.
>
> Fortran however disagrees:
>
> For my Fortran function
>
>  function func()
>     real :: j ! output
>     j = j * cos(j)
>  end function func
>
> I get
>
> func ()
> {
>   real(kind=4) j;
>
>   _1 = __builtin_cosf (j);
>   j = j * _1;
> }
>
> So fortran is not applying the rule on neither GENERIC nor GIMPLE.

See above.

> I am not sure why the GIMPLE one does not apply, but the GENERIC one doesn't 
> apply
> for neither Fortran nor C because of an implicit constrains on 
> `generic-match.c` which comes
> from `maybe_build_call_expr_loc`.
>
> This function will only allow the substitution if the function to be 
> substituted to is an internal builtin.
> Presumably because it wants to prevent introducing a function that it cannot 
> fold or lower later. As this might
> introduce a link error later if it can't remove it.

No, it also allows other substitution:

tree
maybe_build_call_expr_loc (location_t loc, combined_fn fn, tree type,
                           int n, ...)
{
  va_list ap;
  tree *argarray = XALLOCAVEC (tree, n);
  int i;

  va_start (ap, n);
  for (i = 0; i < n; i++)
    argarray[i] = va_arg (ap, tree);
  va_end (ap);
  if (internal_fn_p (fn))
    {
...
    {
      tree fndecl = builtin_decl_implicit (as_builtin_fn (fn));
      if (!fndecl)
        return NULL_TREE;
      return build_call_expr_loc_array (loc, fndecl, n, argarray);

but subject to the same implicit rule as GIMPLE (but handled more
conservatively as I outlined above).

> Fair enough, the way to tell it that it can do get rid of the function is by 
> adding it to `internal-fn.def`.
>
> A comment in the file states that:
>
>    DEF_INTERNAL_FLT_FN is like DEF_INTERNAL_OPTAB_FN, but in addition,
>    the function implements the computational part of a built-in math
>    function BUILT_IN_<NAME>{F,,L}.
>
> Which is exactly what I need.
>
> So to `internal-fn.def` I add a new entry
>
> DEF_INTERNAL_FLT_FN (FOO, ECF_CONST, foo, unary)
>
> Which now in order for it to work requires the optab `foo` to be defined. So 
> in `optabs.def`
> I add:
>
> OPTAB_D (foo_optab, "foo$F$a2")
>
> As the comment suggests I expected a mapping to be done between the typed 
> functions.
> e.g. I expected there to be 3 new internal functions, one for float, double 
> and long double.
>
> However only one new builtin is generated `IFN_FOO`. And this is added in the 
> iterator for FOO
> in place of the `(null)` we saw earlier:
>
>      (define_operator_list FOO
>          BUILT_IN_FOOF
>          BUILT_IN_FOO
>          BUILT_IN_FOOL
>          IFN_FOO)

internal functions are untyped and thus we use them as overloads for
all arg types.

> Re-running the fortran example it still does not match. This is because the 
> `cos` is
> expanded into `cosf` and then matches against `BUILT_IN_FOOF` and not the 
> internal function.
> So generic-match.c still refuses to match it.
>
> The only way I have found to make it match is to change the matching rule to:
>
> /* x*cos(x) -> foo(x).  */
> (for coss (COS)
>  (simplify
>   (mult:c (coss:s @0) @1)
>   (IFN_FOO @0)
>  )
> )
>
> e.g. Map all the rules to the type generic IFN_FOO.
>
> Running the example now still produces nothing, this is because if I 
> understood correctly the rewrite
> is only done *if* an optab for the required type actually exists. So I create 
> one in the backend.
>
> I create an obtab for both double and float types (doing so in the aarch64 
> backend):
>
> (define_insn "foo<mode>2"
>   [(set (match_operand:GPF_F16 0 "register_operand" "=w")
>         (abs:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))]
>   "TARGET_FLOAT"
>   "fabs\\t%<s>0, %<s>1"
>   [(set_attr "type" "ffarith<stype>")]
> )
>
> The operation is non-sense but that doesn't matter, we just care about the 
> gimple:
>
> func ()
> {
>   real(kind=4) j;
>
>   j = FOO (j);
> }
>
>
> So the fortan code has correctly replaced `x*cos(x)` with `FOO`.
>
> C however now crashes with a segfault:
>
> cos.c: In function ‘do_cool_stuff’:
> cos.c:5:4: internal compiler error: Segmentation fault
>     return x*cos(x);
>     ^~~~~~
> 0xb55ddf crash_signal
>         /work/tamchr01/gnu-work/src/gcc/gcc/toplev.c:333
> 0x6f227c builtin_mathfn_code(tree_node const*)
>         /work/tamchr01/gnu-work/src/gcc/gcc/builtins.c:7532
> 0x76070e convert_to_real_1
>
> This happens
>
>   if (TREE_CODE (t) != CALL_EXPR
>       || TREE_CODE (CALL_EXPR_FN (t)) != ADDR_EXPR)
>
> here because `CALL_EXPR_FN (t)` returns NULL. This is because one of the 
> operators
> in the operators list is null. TREE_CODE just blindly dereferences it. This 
> seems to
> be a bug, however I am not sure where. If the bug is in `builtin_mathfn_code` 
> or in the
> code that generated the tree in the first place. Note that this does not 
> happen if
> type casts are not being done.

Yes, the FEs likely do not expect internal FNs around in the IL at
random places.

So first of all I suggest to not try matching on GENERIC (or at least
not spend too much
energy in making it work there).

Richard.

> If I change
>
> float do_cool_stuff(float x)
> {
>    return x*cos(x);
> }
>
> to
>
> float do_cool_stuff(float x)
> {
>    return x*cosf(x);
> }
>
> The type error can be fixed by adding an extra check:
>
>   if (TREE_CODE (t) != CALL_EXPR
>       || (CALL_EXPR_FN (t) && TREE_CODE (CALL_EXPR_FN (t)) != ADDR_EXPR))
>
> to `builtin_mathfn_code` and this seems to work fine and I think is correct, 
> as this function is allowed to say NO.

Reply via email to