On Tue, Sep 16, 2025 at 5:54 AM Richard Biener <[email protected]> wrote: > > On Tue, Sep 16, 2025 at 6:34 AM Andrew Pinski > <[email protected]> wrote: > > > > After copy propagation for aggregates patches we might end up with > > now: > > ``` > > tmp = a; > > b = a; // was b = tmp; > > tmp = {CLOBBER}; > > ``` > > To help out ESRA, it would be a good idea to remove the `tmp = a` statement > > as > > there is no DSE between frowprop and ESRA. copy-prop-aggregate-sra-1.c is > > an example > > where the removal of the copy helps ESRA. > > Meh. You know, forwprop becomes a kitchen-sink. But well. Comments below.
Another option is to move both the copy propagation for aggregates (and a few other mem* optimizations which started to collect in forwprop since 2010) into its own pass. Run it after forwprop1 and once after forwprop2. Maybe once after the loop optimizations are done due to the ldist optimization. I hate adding a new pass too. The original 2 things forwprop was used for propagation of addresses and comparisons. Comparisons are almost all done by match-and-simplify; there are still some few that are not (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120206) Propagation of addresses: I wonder if there is a way to get ccp to do that instead; treat in some cases `a_N ptr+ CST` as a "constant" like how it does copy prop already too. So maybe my plan is to remove forwprop and change it into the memory based optimizations with a few other things thrown in. Plus as I mentioned in a different thread to move the funtionality of pass_fold_builtins and pass_optimize_widening_mul into other passes. We can talk more in person at Cauldron on these plans. I hope others can help with this cleanup and not just myself really. > > > This adds a simple DSE which is only designed to remove the `tmp = a` > > statement. > > This shows up a few times in many C++ code including the code from the > > javascript > > interpreter in ladybird, and in the "fake" testcase in PR 108653 and in the > > aarch64 > > specific PR 89967. > > > > This is disabled for -Og as we don't do dse there either. > > intent_optimize_10.f90 testcase needed to be updated as the constant > > shows up in a debug statement now. > > > > Bootstrapped and tested on x86_64-linux-gnu. > > > > gcc/ChangeLog: > > > > * tree-ssa-forwprop.cc (do_simple_agr_dse): New function. > > (pass_forwprop::execute): Call do_simple_agr_dse for clobbers. > > > > gcc/testsuite/ChangeLog: > > > > * gfortran.dg/intent_optimize_10.f90: Update so -g won't fail. > > * gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c: New testcase. > > > > Signed-off-by: Andrew Pinski <[email protected]> > > --- > > .../tree-ssa/copy-prop-aggregate-sra-1.c | 33 +++++++ > > .../gfortran.dg/intent_optimize_10.f90 | 2 +- > > gcc/tree-ssa-forwprop.cc | 88 +++++++++++++++++++ > > 3 files changed, 122 insertions(+), 1 deletion(-) > > create mode 100644 > > gcc/testsuite/gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c > > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c > > b/gcc/testsuite/gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c > > new file mode 100644 > > index 00000000000..baabecf5d27 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c > > @@ -0,0 +1,33 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O1 -fdump-tree-forwprop1-details > > -fdump-tree-esra-details " } */ > > + > > +struct s1 > > +{ > > + int t[1024]; > > +}; > > + > > +struct s1 f(void); > > + > > +void g(int a, int b, int ); > > +void p(struct s1); > > +static inline void p1(struct s1 inner) > > +{ > > + p(inner); > > +} > > + > > +void h(struct s1 outer) > > +{ > > + p1(outer); > > + g(outer.t[0], outer.t[1], outer.t[2]); > > +} > > +/* Forwprop should be able to copy prop the copy of `inner = outer` to the > > call of p. > > + Also remove this copy. */ > > + > > +/* { dg-final { scan-tree-dump-times "after previous" 1 "forwprop1" } } */ > > +/* { dg-final { scan-tree-dump-times "Removing dead store stmt inner = > > outer" 1 "forwprop1" } } */ > > + > > +/* The extra copy that was done by inlining is removed so SRA should not > > decide to cause > > + inner nor outer to be scalarized even for the 3 elements accessed > > afterwards. */ > > +/* { dg-final { scan-tree-dump-times "Disqualifying inner" 2 "esra" } } */ > > +/* { dg-final { scan-tree-dump-times "Disqualifying outer" 1 "esra" } } */ > > + > > diff --git a/gcc/testsuite/gfortran.dg/intent_optimize_10.f90 > > b/gcc/testsuite/gfortran.dg/intent_optimize_10.f90 > > index d8bc1bb3b7b..214f04cf70d 100644 > > --- a/gcc/testsuite/gfortran.dg/intent_optimize_10.f90 > > +++ b/gcc/testsuite/gfortran.dg/intent_optimize_10.f90 > > @@ -63,4 +63,4 @@ end program main > > > > ! There is a clobber for tc, so we should manage to optimize away the > > associated initialization constant (but not other > > ! initialization constants). > > -! { dg-final { scan-tree-dump-not "123456789" "optimized" { target > > __OPTIMIZE__ } } } > > +! { dg-final { scan-tree-dump-not "= 123456789" "optimized" { target > > __OPTIMIZE__ } } } > > diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc > > index 1eacff01587..17592383e2b 100644 > > --- a/gcc/tree-ssa-forwprop.cc > > +++ b/gcc/tree-ssa-forwprop.cc > > @@ -1708,6 +1708,89 @@ optimize_agr_copyprop (gimple_stmt_iterator *gsip) > > return changed; > > } > > > > +/* Simple DSE of the lhs from a clobber STMT. > > + This is used mostly to clean up from optimize_agr_copyprop and > > + to remove extra copy that might confuse SRA. */ > > +static void > > +do_simple_agr_dse (gassign *stmt, bool full_walk) > > +{ > > + /* Don't do this while in -Og. */ > > + if (optimize_debug) > > + return; > > + ao_ref read; > > + basic_block bb = gimple_bb (stmt); > > + tree lhs = gimple_assign_lhs (stmt); > > + if (!DECL_P (lhs)) > > + return; > > + ao_ref_init (&read, lhs); > > + tree vuse = gimple_vuse (stmt); > > + unsigned limit = full_walk ? param_sccvn_max_alias_queries_per_access : > > 2; I should mention a limit of 2 really should be enough for most common cases. I had thought about changing it over to that fully. That is case where this shows up the most is with small wrapper functions which after inlining we get: ``` a = b; c = a; // or call(a); a = {CLOBBER}; ``` And copy prop changes the above to: ``` a = b; c = b; // or call(b); a = {CLOBBER}; ``` And we want to remove the stmt `a = b;` as SRA will scalarize a even though it is dead later on. Very similar to the reasons why you moved DSE earlier before SRA in GCC 12 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99912#c6) . > > + while (limit) > > + { > > + gimple *ostmt = SSA_NAME_DEF_STMT (vuse); > > + if (is_a<gphi*>(ostmt) || gimple_nop_p (ostmt)) > > + break; > > + basic_block obb = gimple_bb (ostmt); > > + /* If the clobber is not fully dominating the statement define, > > + then it is not "simple" to detect if the define is fully > > clobbered. */ > > + if (obb != bb && !dominated_by_p (CDI_DOMINATORS, bb, obb)) > > + return; > > + /* Quickly see if the reference was used in previous statements. */ > > + bool used = false; > > + gimple *use_stmt; > > + imm_use_iterator iter; > > + FOR_EACH_IMM_USE_STMT (use_stmt, iter, gimple_vdef (ostmt)) > > + { > > + basic_block ubb = gimple_bb (use_stmt); > > + if (stmt == use_stmt) > > + continue; > > + /* The use needs to be dominating the clobber. */ > > you want to decrease limit here as well. This loop is going to be the > most expensive part. True, let me look more into this. > > > + if ((ubb != bb && !dominated_by_p (CDI_DOMINATORS, bb, ubb)) > > + || ref_maybe_used_by_stmt_p (use_stmt, &read, false)) > > + { > > + used = true; > > + break; > > + } > > + } > > + if (used) > > + break; > > + vuse = gimple_vuse (ostmt); > > + > > + /* This a store to the clobbered decl, > > + then remove it. */ > > + if (is_a <gassign*>(ostmt) > > What about calls? I was trying to do as simple as possible DSE and the cases I wanted to remove was when there was a copy and that copy might cause SRA to do something off. > > > + && gimple_store_p (ostmt) > > + && !gimple_clobber_p (ostmt) > > + && operand_equal_p (lhs, gimple_assign_lhs (ostmt))) > > Since LHS is a decl a pointer compare would be enough here. True, will change it. > > > + { > > + /* Don't remove statements that are needed for non-call > > + eh to work. */ > > + if (stmt_unremovable_because_of_non_call_eh_p (cfun, ostmt)) > > + return; > > + /* If we delete a stmt that could throw, mark the block > > + in to_purge to cleanup afterwards. */ > > + if (stmt_could_throw_p (cfun, ostmt)) > > stmt_can_throw_internal? > > I don't think you can elide throwing stmts, at least below you are > removing the whole stmt, not just the LHS when this is a call. We can because of the check stmt_unremovable_because_of_non_call_eh_p above. This is the same check that DSE/DCE do (since r12-475-g8ebf6b99952ada). > > > + bitmap_set_bit (to_purge, obb->index); > > + gimple_stmt_iterator gsi = gsi_for_stmt (ostmt); > > + if (dump_file && (dump_flags & TDF_DETAILS)) > > + { > > + fprintf (dump_file, "Removing dead store stmt "); > > + print_gimple_stmt (dump_file, ostmt, 0); > > + fprintf (dump_file, "\n"); > > + } > > + unlink_stmt_vdef (ostmt); > > + release_defs (ostmt); > > + gsi_remove (&gsi, true); > > + statistics_counter_event (cfun, "delete dead store", 1); > > So you remove at most exactly one store? Yes, at most one store for each clobber. As I tried to mention this is designed to do a quick cleanup after a copy propagation of aggregates happens but before SRA. It is not to catch that much more. > > > + break; > > + } > > + if (stmt_may_clobber_ref_p_1 (ostmt, &read, false) > > Why stop on clobbers? I'm not saying that's not OK, but I think > this deserves a comment. Will add a comment. The idea is just not to worry about anything except for a copy into the decl. So stopping if the decl was clobbered we stop the walk as soon as possible. Thanks, Andrew > > > + || ref_maybe_used_by_stmt_p (ostmt, &read, false)) > > + break; > > + limit--; > > + } > > +} > > + > > /* Optimizes builtin memcmps for small constant sizes. > > GSI_P is the GSI for the call. STMT is the call itself. > > */ > > @@ -5259,6 +5342,11 @@ pass_forwprop::execute (function *fun) > > changed = true; > > break; > > } > > + if (gimple_clobber_p (stmt)) > > + { > > + do_simple_agr_dse (as_a<gassign*>(stmt), full_walk); > > + break; > > + } > > > > if (TREE_CODE_CLASS (code) == tcc_comparison) > > changed |= forward_propagate_into_comparison (&gsi); > > -- > > 2.43.0 > >
