On Fri, Jan 27, 2017 at 4:00 PM, Tamar Christina
<[email protected]> wrote:
> Hi All,
>
> I am trying to understand how match.pd works as I'm writing a simple matching
> rule but have run into some issues
> and there's very little documented on match.pd.
>
> short version:
>
> 1) Why is there a difference in expressiveness between the GIMPLE and the
> GENERIC
> versions of match.pd. Particularly why does GENERIC has the limitation
> that it must
> be an internal function while GIMPLE is happy to just do the substitution.
I think you run into the limitation that during GENERIC we simply
don't know reliably which functions we may
emit calls to.
> 2) Why doesn't the GIMPLE pass match on the GIMPLE code produced by the
> Fortran version?
> I have created an example where the GIMPLE trees of the two match exactly
> % some attributes
> on the expressions.
Fortran (and other frontends besides C-family) do not get their
builtins from builtins.def but need to
provide their own version. See fortran/mathbuiltins.def.
> 3) It seems to be either or, you can't have both rules? e.g. I either match
> in GENERIC or GIMPLE
> but not both? Is this correct? Right now if some other optimization makes
> it produce x*cos(x) then
> this will never be matched again? Since the GIMPLE passes don't seem to
> like the internal builtin?
That's not correct. You always match both unless you disable one or
the other explicitely. GIMPLE passes
should use cfn_ throughout, otherwise it's a bug.
> 4) There seems to be a bug with matching in GENERIC pass and when type
> conversions are being done. I
> am not sure who is wrong though. In the TREE the type has been correctly
> casted but it results in a segfault.
>
> Thanks,
> Tamar
>
> ----
>
> For the long version and to see my current understanding of match.pd and what
> I have done:
>
> Let’s say I want to transform occurrences of `x*cos(x)` with a super-dooper
> optimized builtin `foo`.
> So `x*cos(x) => foo(x)`.
>
> I start off at the obvious place, `match.pd`.
>
> Since I need a new builtin I have to first define it in `builtins.def`. Since
> I’m replacing a math function I want optimized versions for `float`,
> `double`, etc.
>
> DEF_GCC_BUILTIN (BUILT_IN_FOO, "foo", BT_FN_DOUBLE_DOUBLE_DOUBLE,
> ATTR_CONST_NOTHROW_LEAF_LIST)
> DEF_GCC_BUILTIN (BUILT_IN_FOOF, "foof", BT_FN_FLOAT_FLOAT_FLOAT,
> ATTR_CONST_NOTHROW_LEAF_LIST)
> DEF_GCC_BUILTIN (BUILT_IN_FOOL, "fool",
> BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
> #define FOO_TYPE(F) BT_FN_##F##_##F##_##F
> DEF_GCC_FLOATN_NX_BUILTINS (BUILT_IN_FOO, "foo", FOO_TYPE,
> ATTR_CONST_NOTHROW_LEAF_LIST)
> #undef FOO_TYPE
>
> So now I have the builtins defined (I need to also define the proper handling
> code in builtins.c but that’s unrelated to the question) and can define the
> match rule:
>
> /* x*cos(x) -> foo(x). */
> (for coss (COS)
> (simplify
> (mult:c (coss:s @0) @1)
> ...
> )
> )
>
> And the question is what goes into the … block.
>
> I have builtins defined for each of the functions, so one implementation
> could be:
>
> /* x*cos(x) -> foo(x). */
> (for coss (COS)
> (simplify
> (mult:c (coss:s @0) @1)
> (switch
> (if (types_match (type, float_type_node))
> (BUILT_IN_FOOF @1 @0))
> (if (types_match (type, double_type_node))
> (BUILT_IN_FOO @1 @0))
> (if (types_match (type, long_double_type_node))
> (BUILT_IN_FOOL @1 @0))
> )
> )
> )
>
> But one of the artifacts of defining the builtin in `builtins.def` is that an
> iterator is created for you for use in `match.pd` by `gencfn-macros.c`, in
> this case I have something like:
>
> (define_operator_list FOO
> BUILT_IN_FOOF
> BUILT_IN_FOO
> BUILT_IN_FOOL
> null)
>
> The last `null` is important but I'll get to that later. These are generated
> and placed in `cfn-operators.pd` which is
> included by `match.pd`.
>
> The other artifact is a macro, for use in case statements in e.g. builtin.c,
> e.g. `CASE_CFN_FOO`
> Which expands to:
>
> case CFN_BUILT_IN_FOOF:
> case CFN_BUILT_IN_FOOT:
> case CFN_BUILT_IN_FOOL:
> case CFN_FOO:
>
> These are handy, because it means we can easily write match statements to the
> correct type. The iterator items are generated in increasing order of width
> of the type. E.g. float, double, long double.
>
> So by iterating over both `COS` and the `FOO` iterator at the same time, we
> can simplify the the match rule:
>
> /* x*cos(x) -> foo(x). */
> (for coss (COS)
> foos (FOO)
> (simplify
> (mult:c (coss:s @0) @1)
> (foos @0)
> )
> )
Looks good sofar. But note that the FOOs are likely not !
builtin_decl_implicit and thus
we won't generate calls to them unless calls to them are already
present in the program
(this is determined during gimplification thus _after_ GENERIC).
> This rule gets compiled into two files `generic-match.c` and `gimple-match.c`.
> The first working on GENERIC tree and the second ofcourse on GIMPLE trees.
>
> Given that GIMPLE and GENERIC have slightly different semantics the code for
> these
> two are fairly different. The placement of the rule in `match.pd` seems to be
> important,
> the rules are applied in order of their specification in `match.pd`.
>
> The GIMPLE matcher will generate statements such as
>
> switch (gimple_call_combined_fn (def))
> {
> case CFN_BUILT_IN_COSF:
> {
> tree o20 = gimple_call_arg (def, 0);
> if ((o20 = do_valueize (valueize, o20)))
> {
> {
> ...
> *res_code = CFN_BUILT_IN_FOOF;
> res_ops[0] = captures[1];
> gimple_resimplify1 (lseq, res_code, type, res_ops,
> valueize);
> return true;
> }
> }
> break;
> }
> case CFN_BUILT_IN_COSS:
> {
> tree o20 = gimple_call_arg (def, 0);
> if ((o20 = do_valueize (valueize, o20)))
> {
> {
> ...
> *res_code = CFN_BUILT_IN_FOO;
> res_ops[0] = captures[1];
> gimple_resimplify1 (lseq, res_code, type, res_ops,
> valueize);
> return true;
> }
> }
> break;
> }
> ...
> default:;
> }
> }
>
> Because of the order of the iterations, the correct function is matched. e.g.
> COSF -> FOOF,
> COS -> FOO, etc.
>
> The GENERIC version has the same global structure but differs vastly in how
> replacements are done:
>
> case CALL_EXPR:
> switch (get_call_combined_fn (op0))
> {
> case CFN_BUILT_IN_COSF:
> {
> tree o20 = CALL_EXPR_ARG (op0, 0);
> {
> ...
> res = maybe_build_call_expr_loc (loc, CFN_BUILT_IN_FOOF, type,
> 1, res_op0);
> if (!res)
> return NULL_TREE;
> if (TREE_SIDE_EFFECTS (captures[2]))
> res = build2_loc (loc, COMPOUND_EXPR, type,
> fold_ignored_result (captures[2]), res);
> return res;
> }
> break;
> }
> case CFN_BUILT_IN_COSS:
> ...
>
> The important function here is `maybe_build_call_expr_loc`, again, I'll get
> to that later.
>
> Testing it out, given a C input file:
>
> #include <math.h>
>
> float do_cool_stuff(float x)
> {
> return x*cos(x);
> }
>
> we get:
>
> do_cool_stuff (float x)
> {
> float D.4154;
>
> _1 = (double) x;
> _2 = __builtin_foo (_1);
> D.4154 = (float) _2;
> return D.4154;
> }
>
> The rule seems to be matching early in the GIMPLE phase and not the generic,
> fair enough as long as it works.
Btw, why are you interested in GENERIC matching at all here?
> It also correctly adds type casting and calling the double version of `foo`.
>
> Fortran however disagrees:
>
> For my Fortran function
>
> function func()
> real :: j ! output
> j = j * cos(j)
> end function func
>
> I get
>
> func ()
> {
> real(kind=4) j;
>
> _1 = __builtin_cosf (j);
> j = j * _1;
> }
>
> So fortran is not applying the rule on neither GENERIC nor GIMPLE.
See above.
> I am not sure why the GIMPLE one does not apply, but the GENERIC one doesn't
> apply
> for neither Fortran nor C because of an implicit constrains on
> `generic-match.c` which comes
> from `maybe_build_call_expr_loc`.
>
> This function will only allow the substitution if the function to be
> substituted to is an internal builtin.
> Presumably because it wants to prevent introducing a function that it cannot
> fold or lower later. As this might
> introduce a link error later if it can't remove it.
No, it also allows other substitution:
tree
maybe_build_call_expr_loc (location_t loc, combined_fn fn, tree type,
int n, ...)
{
va_list ap;
tree *argarray = XALLOCAVEC (tree, n);
int i;
va_start (ap, n);
for (i = 0; i < n; i++)
argarray[i] = va_arg (ap, tree);
va_end (ap);
if (internal_fn_p (fn))
{
...
{
tree fndecl = builtin_decl_implicit (as_builtin_fn (fn));
if (!fndecl)
return NULL_TREE;
return build_call_expr_loc_array (loc, fndecl, n, argarray);
but subject to the same implicit rule as GIMPLE (but handled more
conservatively as I outlined above).
> Fair enough, the way to tell it that it can do get rid of the function is by
> adding it to `internal-fn.def`.
>
> A comment in the file states that:
>
> DEF_INTERNAL_FLT_FN is like DEF_INTERNAL_OPTAB_FN, but in addition,
> the function implements the computational part of a built-in math
> function BUILT_IN_<NAME>{F,,L}.
>
> Which is exactly what I need.
>
> So to `internal-fn.def` I add a new entry
>
> DEF_INTERNAL_FLT_FN (FOO, ECF_CONST, foo, unary)
>
> Which now in order for it to work requires the optab `foo` to be defined. So
> in `optabs.def`
> I add:
>
> OPTAB_D (foo_optab, "foo$F$a2")
>
> As the comment suggests I expected a mapping to be done between the typed
> functions.
> e.g. I expected there to be 3 new internal functions, one for float, double
> and long double.
>
> However only one new builtin is generated `IFN_FOO`. And this is added in the
> iterator for FOO
> in place of the `(null)` we saw earlier:
>
> (define_operator_list FOO
> BUILT_IN_FOOF
> BUILT_IN_FOO
> BUILT_IN_FOOL
> IFN_FOO)
internal functions are untyped and thus we use them as overloads for
all arg types.
> Re-running the fortran example it still does not match. This is because the
> `cos` is
> expanded into `cosf` and then matches against `BUILT_IN_FOOF` and not the
> internal function.
> So generic-match.c still refuses to match it.
>
> The only way I have found to make it match is to change the matching rule to:
>
> /* x*cos(x) -> foo(x). */
> (for coss (COS)
> (simplify
> (mult:c (coss:s @0) @1)
> (IFN_FOO @0)
> )
> )
>
> e.g. Map all the rules to the type generic IFN_FOO.
>
> Running the example now still produces nothing, this is because if I
> understood correctly the rewrite
> is only done *if* an optab for the required type actually exists. So I create
> one in the backend.
>
> I create an obtab for both double and float types (doing so in the aarch64
> backend):
>
> (define_insn "foo<mode>2"
> [(set (match_operand:GPF_F16 0 "register_operand" "=w")
> (abs:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))]
> "TARGET_FLOAT"
> "fabs\\t%<s>0, %<s>1"
> [(set_attr "type" "ffarith<stype>")]
> )
>
> The operation is non-sense but that doesn't matter, we just care about the
> gimple:
>
> func ()
> {
> real(kind=4) j;
>
> j = FOO (j);
> }
>
>
> So the fortan code has correctly replaced `x*cos(x)` with `FOO`.
>
> C however now crashes with a segfault:
>
> cos.c: In function ‘do_cool_stuff’:
> cos.c:5:4: internal compiler error: Segmentation fault
> return x*cos(x);
> ^~~~~~
> 0xb55ddf crash_signal
> /work/tamchr01/gnu-work/src/gcc/gcc/toplev.c:333
> 0x6f227c builtin_mathfn_code(tree_node const*)
> /work/tamchr01/gnu-work/src/gcc/gcc/builtins.c:7532
> 0x76070e convert_to_real_1
>
> This happens
>
> if (TREE_CODE (t) != CALL_EXPR
> || TREE_CODE (CALL_EXPR_FN (t)) != ADDR_EXPR)
>
> here because `CALL_EXPR_FN (t)` returns NULL. This is because one of the
> operators
> in the operators list is null. TREE_CODE just blindly dereferences it. This
> seems to
> be a bug, however I am not sure where. If the bug is in `builtin_mathfn_code`
> or in the
> code that generated the tree in the first place. Note that this does not
> happen if
> type casts are not being done.
Yes, the FEs likely do not expect internal FNs around in the IL at
random places.
So first of all I suggest to not try matching on GENERIC (or at least
not spend too much
energy in making it work there).
Richard.
> If I change
>
> float do_cool_stuff(float x)
> {
> return x*cos(x);
> }
>
> to
>
> float do_cool_stuff(float x)
> {
> return x*cosf(x);
> }
>
> The type error can be fixed by adding an extra check:
>
> if (TREE_CODE (t) != CALL_EXPR
> || (CALL_EXPR_FN (t) && TREE_CODE (CALL_EXPR_FN (t)) != ADDR_EXPR))
>
> to `builtin_mathfn_code` and this seems to work fine and I think is correct,
> as this function is allowed to say NO.