On Sun, 17 Nov 2024, Jan Hubicka wrote: > Last year I made modref to track which functions are deterministic - i.e. they > produce same effects given same inputs (including memory) and which functions > have no side effects (which includes infinite loops, trapping etc.). > > deterministic functions can be CSEed just as looping pure/const. > These are similar to reproducible/unsequenced attributes. > > This patch implements tree-ssa-sccvn bits so we can now optimize non-pure > function. For example: > > [[gnu::noinline]] > int dup(int *a) > { > a[0]=a[1]; > return a[2]; > } > int > test (int *b) > { > return dup (b) + dup (b); > } > > Will now be optimized to return 2 * dup (b); > > The sccvn bits are quite simple: visit_stmt has to allow VN lookups for > deterministic functions while visit_reference_op_call currently contains test > to give up if function has vdef or no lhs. I think the vdef test was > redundant > even before since pure/const function can return aggregate. Newly even if > there is > no lhs we can eliminate redundancies as in: > > [[gnu::noinline]] > void dup(int *a) > { > a[0]=a[1]; > } > void > test (int *b) > { > dup (b); > dup (b); > } > > I think it is relatively useful feature to optimize out functions that > compute some fields > of structure based on other fields. It optimizes out 2.6K of clang text > segment. This is not much, but I hope it will improve in future. For > example, > right now dup call can not do sanity checks. Call to abort makes us believe > that all global memory is used and that will prevent optimizations. However > modref can keep track of the fact that global memory read only happens upon > termination and ignore those accesses while determining redundancy... > > Also reproduce/unsequential makes it possible to ignore stores of function > call itself > to detemrine redundancy. I am not quite sure how to fit that into PRE. > > bootstrapped/regtested x86_64-linux. OK? > > gcc/ChangeLog: > > * ipa-modref.cc (get_modref_function_summary): Add extra fn paramaeter > to pass value numbering info. > (ipa_modref_can_remove_redundat_calls): New function. > * ipa-modref.h (get_modref_function_summary): Update declaration. > (ipa_modref_can_remove_redundat_calls): declare. > * tree-ssa-sccvn.cc (visit_reference_op_call): Also try to optimize > calls > with vdef and no lhs. > (visit_stmt): Use ipa_modref_can_remove_redundat_calls. > > gcc/testsuite/ChangeLog: > > * gcc.dg/tree-ssa/modref-16.c: New test. > * gcc.dg/tree-ssa/modref-17.c: New test. > > diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc > index 08a7740de94..50d18287219 100644 > --- a/gcc/ipa-modref.cc > +++ b/gcc/ipa-modref.cc > @@ -766,12 +766,14 @@ get_modref_function_summary (cgraph_node *func) > /* Get function summary for CALL if it exists, return NULL otherwise. > If non-null set interposed to indicate whether function may not > bind to current def. In this case sometimes loads from function > - needs to be ignored. */ > + needs to be ignored. > + If FN is non-NULL, it specifies indirect call target (for example when > known > + to value numbering). */ > > modref_summary * > -get_modref_function_summary (gcall *call, bool *interposed) > +get_modref_function_summary (gcall *call, bool *interposed, tree fn) > { > - tree callee = gimple_call_fndecl (call); > + tree callee = fn ? fn : gimple_call_fndecl (call); > if (!callee) > return NULL; > struct cgraph_node *node = cgraph_node::get (callee); > @@ -5648,4 +5650,33 @@ ipa_modref_callee_reads_no_memory_p (gcall *call) > return false; > } > > +/* Return true if CALL can be optimized out provided that was called with > + exactly same value number before. > + > + If FN is non-NULL, it specifies indirect call target (for example when > known > + to value numbering). */ > + > +bool > +ipa_modref_can_remove_redundat_calls (gcall *call, tree fn) > +{ > + if (gimple_call_flags (call) & (ECF_CONST | ECF_PURE)) > + return true; > + if (fn && (flags_from_decl_or_type (fn) & (ECF_CONST | ECF_PURE))) > + return true; > + /* If function has multiple ways to return, we would have to remember > + the way it returned last time. > + Terminating function execution (i.e. throwing external exception) > + is OK. */ > + basic_block bb = gimple_bb (call); > + if (call == gsi_stmt (gsi_last_bb (bb)) && !single_succ_p (bb)) > + return false; > + modref_summary *sum = get_modref_function_summary (call, NULL, fn); > + /* If function is deterministic, we can remove redundant invocations. > + However, in current implementaiton, we must track that loads and stores > do > + not overlap. Special case the situation that function reads and writes > + global memory and therefore we know nothing. */ > + return sum && !sum->nondeterministic > + && (!sum->global_memory_read || !sum->global_memory_written); > +} > + > #include "gt-ipa-modref.h" > diff --git a/gcc/ipa-modref.h b/gcc/ipa-modref.h > index a0eb63a0afa..5bb4172ef9e 100644 > --- a/gcc/ipa-modref.h > +++ b/gcc/ipa-modref.h > @@ -72,10 +72,12 @@ struct GTY(()) modref_summary > }; > > modref_summary *get_modref_function_summary (cgraph_node *func); > -modref_summary *get_modref_function_summary (gcall *call, bool *interposed); > +modref_summary *get_modref_function_summary (gcall *call, bool *interposed, > + tree fn = NULL); > void ipa_modref_cc_finalize (); > void ipa_merge_modref_summary_after_inlining (cgraph_edge *e); > bool ipa_modref_callee_reads_no_memory_p (gcall *call); > +bool ipa_modref_can_remove_redundat_calls (gcall *call, tree fn = NULL); > > /* All flags that are implied by the ECF_CONST functions. */ > static const int implicit_const_eaf_flags > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-16.c > b/gcc/testsuite/gcc.dg/tree-ssa/modref-16.c > new file mode 100644 > index 00000000000..702349bbc02 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-16.c > @@ -0,0 +1,14 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-release_ssa" } */ > +[[gnu::noinline]] > +int dup(int *a) > +{ > + a[0]=a[1]; > + return a[2]; > +} > +int > +test (int *b) > +{ > + return dup (b) + dup (b); > +} > +/* { dg-final { scan-tree-dump-times "dup .b" 1 "release_ssa"} } */ > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-17.c > b/gcc/testsuite/gcc.dg/tree-ssa/modref-17.c > new file mode 100644 > index 00000000000..703cb46fabb > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-17.c > @@ -0,0 +1,13 @@ > +/* { dg-options "-O2 -fdump-tree-release_ssa" } */ > +[[gnu::noinline]] > +void dup(int *a) > +{ > + a[0]=a[1]; > +} > +void > +test (int *b) > +{ > + dup (b); > + dup (b); > +} > +/* { dg-final { scan-tree-dump-times "dup .b" 1 "release_ssa"} } */ > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc > index e93acb44200..6e31f8b3cfe 100644 > --- a/gcc/tree-ssa-sccvn.cc > +++ b/gcc/tree-ssa-sccvn.cc > @@ -5567,8 +5567,6 @@ visit_reference_op_call (tree lhs, gcall *stmt) > modref info to find a candidate to CSE to. */ > const unsigned accesses_limit = 8; > if (!vnresult > - && !vdef > - && lhs
I don't think we handle mem = foo (); correctly in VN, the && lhs condition also guards this. Maybe instead refactor this and the check a few lines above to check (!lhs || TREE_CODE (lhs) == SSA_NAME) ? The VUSE->VDEF chain walking also doesn't consider the call having memory side-effects since it effectively skips intermittent uses. So I believe you have to adjust (or guard) that as well, alternatively visit all uses for each def walked. > && gimple_vuse (stmt) > && (((summary = get_modref_function_summary (stmt, NULL)) > && !summary->global_memory_read > @@ -6354,19 +6352,18 @@ visit_stmt (gimple *stmt, bool backedges_varying_p = > false) > > /* Pick up flags from a devirtualization target. */ > tree fn = gimple_call_fn (stmt); > - int extra_fnflags = 0; > if (fn && TREE_CODE (fn) == SSA_NAME) > { > fn = SSA_VAL (fn); > if (TREE_CODE (fn) == ADDR_EXPR > && TREE_CODE (TREE_OPERAND (fn, 0)) == FUNCTION_DECL) > - extra_fnflags = flags_from_decl_or_type (TREE_OPERAND (fn, 0)); > + fn = TREE_OPERAND (fn, 0); > + else > + fn = NULL; > } > - if ((/* Calls to the same function with the same vuse > - and the same operands do not necessarily return the same > - value, unless they're pure or const. */ > - ((gimple_call_flags (call_stmt) | extra_fnflags) > - & (ECF_PURE | ECF_CONST)) > + else > + fn = NULL; > + if ((ipa_modref_can_remove_redundat_calls (call_stmt, fn) > /* If calls have a vdef, subsequent calls won't have > the same incoming vuse. So, if 2 calls with vdef have the > same vuse, we know they're not subsequent. With disregarding VDEF this comment is now wrong (it's directed at tail-merging btw). I'll note that elimination will only be able to "DCE" calls with a LHS since "DCE" happens by replacing the RHS. That's also what the && lhs check is about - we don't do anything useful during elimination when there's no LHS but the call itself is present in the expression hash. Richard.