On Fri, Jan 27, 2017 at 4:00 PM, Tamar Christina <tamar.christ...@arm.com> wrote: > Hi All, > > I am trying to understand how match.pd works as I'm writing a simple matching > rule but have run into some issues > and there's very little documented on match.pd. > > short version: > > 1) Why is there a difference in expressiveness between the GIMPLE and the > GENERIC > versions of match.pd. Particularly why does GENERIC has the limitation > that it must > be an internal function while GIMPLE is happy to just do the substitution.
I think you run into the limitation that during GENERIC we simply don't know reliably which functions we may emit calls to. > 2) Why doesn't the GIMPLE pass match on the GIMPLE code produced by the > Fortran version? > I have created an example where the GIMPLE trees of the two match exactly > % some attributes > on the expressions. Fortran (and other frontends besides C-family) do not get their builtins from builtins.def but need to provide their own version. See fortran/mathbuiltins.def. > 3) It seems to be either or, you can't have both rules? e.g. I either match > in GENERIC or GIMPLE > but not both? Is this correct? Right now if some other optimization makes > it produce x*cos(x) then > this will never be matched again? Since the GIMPLE passes don't seem to > like the internal builtin? That's not correct. You always match both unless you disable one or the other explicitely. GIMPLE passes should use cfn_ throughout, otherwise it's a bug. > 4) There seems to be a bug with matching in GENERIC pass and when type > conversions are being done. I > am not sure who is wrong though. In the TREE the type has been correctly > casted but it results in a segfault. > > Thanks, > Tamar > > ---- > > For the long version and to see my current understanding of match.pd and what > I have done: > > Let’s say I want to transform occurrences of `x*cos(x)` with a super-dooper > optimized builtin `foo`. > So `x*cos(x) => foo(x)`. > > I start off at the obvious place, `match.pd`. > > Since I need a new builtin I have to first define it in `builtins.def`. Since > I’m replacing a math function I want optimized versions for `float`, > `double`, etc. > > DEF_GCC_BUILTIN (BUILT_IN_FOO, "foo", BT_FN_DOUBLE_DOUBLE_DOUBLE, > ATTR_CONST_NOTHROW_LEAF_LIST) > DEF_GCC_BUILTIN (BUILT_IN_FOOF, "foof", BT_FN_FLOAT_FLOAT_FLOAT, > ATTR_CONST_NOTHROW_LEAF_LIST) > DEF_GCC_BUILTIN (BUILT_IN_FOOL, "fool", > BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST) > #define FOO_TYPE(F) BT_FN_##F##_##F##_##F > DEF_GCC_FLOATN_NX_BUILTINS (BUILT_IN_FOO, "foo", FOO_TYPE, > ATTR_CONST_NOTHROW_LEAF_LIST) > #undef FOO_TYPE > > So now I have the builtins defined (I need to also define the proper handling > code in builtins.c but that’s unrelated to the question) and can define the > match rule: > > /* x*cos(x) -> foo(x). */ > (for coss (COS) > (simplify > (mult:c (coss:s @0) @1) > ... > ) > ) > > And the question is what goes into the … block. > > I have builtins defined for each of the functions, so one implementation > could be: > > /* x*cos(x) -> foo(x). */ > (for coss (COS) > (simplify > (mult:c (coss:s @0) @1) > (switch > (if (types_match (type, float_type_node)) > (BUILT_IN_FOOF @1 @0)) > (if (types_match (type, double_type_node)) > (BUILT_IN_FOO @1 @0)) > (if (types_match (type, long_double_type_node)) > (BUILT_IN_FOOL @1 @0)) > ) > ) > ) > > But one of the artifacts of defining the builtin in `builtins.def` is that an > iterator is created for you for use in `match.pd` by `gencfn-macros.c`, in > this case I have something like: > > (define_operator_list FOO > BUILT_IN_FOOF > BUILT_IN_FOO > BUILT_IN_FOOL > null) > > The last `null` is important but I'll get to that later. These are generated > and placed in `cfn-operators.pd` which is > included by `match.pd`. > > The other artifact is a macro, for use in case statements in e.g. builtin.c, > e.g. `CASE_CFN_FOO` > Which expands to: > > case CFN_BUILT_IN_FOOF: > case CFN_BUILT_IN_FOOT: > case CFN_BUILT_IN_FOOL: > case CFN_FOO: > > These are handy, because it means we can easily write match statements to the > correct type. The iterator items are generated in increasing order of width > of the type. E.g. float, double, long double. > > So by iterating over both `COS` and the `FOO` iterator at the same time, we > can simplify the the match rule: > > /* x*cos(x) -> foo(x). */ > (for coss (COS) > foos (FOO) > (simplify > (mult:c (coss:s @0) @1) > (foos @0) > ) > ) Looks good sofar. But note that the FOOs are likely not ! builtin_decl_implicit and thus we won't generate calls to them unless calls to them are already present in the program (this is determined during gimplification thus _after_ GENERIC). > This rule gets compiled into two files `generic-match.c` and `gimple-match.c`. > The first working on GENERIC tree and the second ofcourse on GIMPLE trees. > > Given that GIMPLE and GENERIC have slightly different semantics the code for > these > two are fairly different. The placement of the rule in `match.pd` seems to be > important, > the rules are applied in order of their specification in `match.pd`. > > The GIMPLE matcher will generate statements such as > > switch (gimple_call_combined_fn (def)) > { > case CFN_BUILT_IN_COSF: > { > tree o20 = gimple_call_arg (def, 0); > if ((o20 = do_valueize (valueize, o20))) > { > { > ... > *res_code = CFN_BUILT_IN_FOOF; > res_ops[0] = captures[1]; > gimple_resimplify1 (lseq, res_code, type, res_ops, > valueize); > return true; > } > } > break; > } > case CFN_BUILT_IN_COSS: > { > tree o20 = gimple_call_arg (def, 0); > if ((o20 = do_valueize (valueize, o20))) > { > { > ... > *res_code = CFN_BUILT_IN_FOO; > res_ops[0] = captures[1]; > gimple_resimplify1 (lseq, res_code, type, res_ops, > valueize); > return true; > } > } > break; > } > ... > default:; > } > } > > Because of the order of the iterations, the correct function is matched. e.g. > COSF -> FOOF, > COS -> FOO, etc. > > The GENERIC version has the same global structure but differs vastly in how > replacements are done: > > case CALL_EXPR: > switch (get_call_combined_fn (op0)) > { > case CFN_BUILT_IN_COSF: > { > tree o20 = CALL_EXPR_ARG (op0, 0); > { > ... > res = maybe_build_call_expr_loc (loc, CFN_BUILT_IN_FOOF, type, > 1, res_op0); > if (!res) > return NULL_TREE; > if (TREE_SIDE_EFFECTS (captures[2])) > res = build2_loc (loc, COMPOUND_EXPR, type, > fold_ignored_result (captures[2]), res); > return res; > } > break; > } > case CFN_BUILT_IN_COSS: > ... > > The important function here is `maybe_build_call_expr_loc`, again, I'll get > to that later. > > Testing it out, given a C input file: > > #include <math.h> > > float do_cool_stuff(float x) > { > return x*cos(x); > } > > we get: > > do_cool_stuff (float x) > { > float D.4154; > > _1 = (double) x; > _2 = __builtin_foo (_1); > D.4154 = (float) _2; > return D.4154; > } > > The rule seems to be matching early in the GIMPLE phase and not the generic, > fair enough as long as it works. Btw, why are you interested in GENERIC matching at all here? > It also correctly adds type casting and calling the double version of `foo`. > > Fortran however disagrees: > > For my Fortran function > > function func() > real :: j ! output > j = j * cos(j) > end function func > > I get > > func () > { > real(kind=4) j; > > _1 = __builtin_cosf (j); > j = j * _1; > } > > So fortran is not applying the rule on neither GENERIC nor GIMPLE. See above. > I am not sure why the GIMPLE one does not apply, but the GENERIC one doesn't > apply > for neither Fortran nor C because of an implicit constrains on > `generic-match.c` which comes > from `maybe_build_call_expr_loc`. > > This function will only allow the substitution if the function to be > substituted to is an internal builtin. > Presumably because it wants to prevent introducing a function that it cannot > fold or lower later. As this might > introduce a link error later if it can't remove it. No, it also allows other substitution: tree maybe_build_call_expr_loc (location_t loc, combined_fn fn, tree type, int n, ...) { va_list ap; tree *argarray = XALLOCAVEC (tree, n); int i; va_start (ap, n); for (i = 0; i < n; i++) argarray[i] = va_arg (ap, tree); va_end (ap); if (internal_fn_p (fn)) { ... { tree fndecl = builtin_decl_implicit (as_builtin_fn (fn)); if (!fndecl) return NULL_TREE; return build_call_expr_loc_array (loc, fndecl, n, argarray); but subject to the same implicit rule as GIMPLE (but handled more conservatively as I outlined above). > Fair enough, the way to tell it that it can do get rid of the function is by > adding it to `internal-fn.def`. > > A comment in the file states that: > > DEF_INTERNAL_FLT_FN is like DEF_INTERNAL_OPTAB_FN, but in addition, > the function implements the computational part of a built-in math > function BUILT_IN_<NAME>{F,,L}. > > Which is exactly what I need. > > So to `internal-fn.def` I add a new entry > > DEF_INTERNAL_FLT_FN (FOO, ECF_CONST, foo, unary) > > Which now in order for it to work requires the optab `foo` to be defined. So > in `optabs.def` > I add: > > OPTAB_D (foo_optab, "foo$F$a2") > > As the comment suggests I expected a mapping to be done between the typed > functions. > e.g. I expected there to be 3 new internal functions, one for float, double > and long double. > > However only one new builtin is generated `IFN_FOO`. And this is added in the > iterator for FOO > in place of the `(null)` we saw earlier: > > (define_operator_list FOO > BUILT_IN_FOOF > BUILT_IN_FOO > BUILT_IN_FOOL > IFN_FOO) internal functions are untyped and thus we use them as overloads for all arg types. > Re-running the fortran example it still does not match. This is because the > `cos` is > expanded into `cosf` and then matches against `BUILT_IN_FOOF` and not the > internal function. > So generic-match.c still refuses to match it. > > The only way I have found to make it match is to change the matching rule to: > > /* x*cos(x) -> foo(x). */ > (for coss (COS) > (simplify > (mult:c (coss:s @0) @1) > (IFN_FOO @0) > ) > ) > > e.g. Map all the rules to the type generic IFN_FOO. > > Running the example now still produces nothing, this is because if I > understood correctly the rewrite > is only done *if* an optab for the required type actually exists. So I create > one in the backend. > > I create an obtab for both double and float types (doing so in the aarch64 > backend): > > (define_insn "foo<mode>2" > [(set (match_operand:GPF_F16 0 "register_operand" "=w") > (abs:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))] > "TARGET_FLOAT" > "fabs\\t%<s>0, %<s>1" > [(set_attr "type" "ffarith<stype>")] > ) > > The operation is non-sense but that doesn't matter, we just care about the > gimple: > > func () > { > real(kind=4) j; > > j = FOO (j); > } > > > So the fortan code has correctly replaced `x*cos(x)` with `FOO`. > > C however now crashes with a segfault: > > cos.c: In function ‘do_cool_stuff’: > cos.c:5:4: internal compiler error: Segmentation fault > return x*cos(x); > ^~~~~~ > 0xb55ddf crash_signal > /work/tamchr01/gnu-work/src/gcc/gcc/toplev.c:333 > 0x6f227c builtin_mathfn_code(tree_node const*) > /work/tamchr01/gnu-work/src/gcc/gcc/builtins.c:7532 > 0x76070e convert_to_real_1 > > This happens > > if (TREE_CODE (t) != CALL_EXPR > || TREE_CODE (CALL_EXPR_FN (t)) != ADDR_EXPR) > > here because `CALL_EXPR_FN (t)` returns NULL. This is because one of the > operators > in the operators list is null. TREE_CODE just blindly dereferences it. This > seems to > be a bug, however I am not sure where. If the bug is in `builtin_mathfn_code` > or in the > code that generated the tree in the first place. Note that this does not > happen if > type casts are not being done. Yes, the FEs likely do not expect internal FNs around in the IL at random places. So first of all I suggest to not try matching on GENERIC (or at least not spend too much energy in making it work there). Richard. > If I change > > float do_cool_stuff(float x) > { > return x*cos(x); > } > > to > > float do_cool_stuff(float x) > { > return x*cosf(x); > } > > The type error can be fixed by adding an extra check: > > if (TREE_CODE (t) != CALL_EXPR > || (CALL_EXPR_FN (t) && TREE_CODE (CALL_EXPR_FN (t)) != ADDR_EXPR)) > > to `builtin_mathfn_code` and this seems to work fine and I think is correct, > as this function is allowed to say NO.