https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118521
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://gcc.gnu.org/bugzill | |a/show_bug.cgi?id=118817 --- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> --- Looks similar to PR118817 btw. Like that we're diagnosing from the strlen pass which has a somewhat unfortunate pass position. <bb 2> [local count: 131235111]: _53 = operator new (2); <bb 3> [local count: 131235111]: MEM <unsigned char[2]> [(char * {ref-all})_53] = MEM <unsigned char[2]> [(char * {ref-all})&C.0]; __result_46 = _53 + 2; _150 = operator new (4); goto <bb 5>; [100.00%] <bb 5> [local count: 131235112]: _97 = _150 + 2; __builtin_memset (_97, 0, 2); MEM <unsigned short> [(char * {ref-all})_150] = 513; __result_274 = _150 + 1; __new_finish_106 = __result_274 + 3; operator delete (_53, 2); _115 = _150 + 4; if (__new_finish_106 != _115) goto <bb 6>; [82.57%] else goto <bb 7>; [17.43%] <bb 6> [local count: 108360832]: MEM[(char *)_97 + 2B] = 1; like in the other PR we are missing the power of forwprop which would have accumulated the constant adjustments, eliding the BB6 enter condition. As you say it's SCCP exposing the opportunity. Neither FRE nor DOM have the ability to prove equivalence on larger expressions like this, aka (_150 + 4) == ((_150 + 1) + 3) but they instead rely on instruction combinations. Now, FRE does "fold" each stmt, but tries to simplify it down to a constant/copy and if that's not possible goes with the original stmt for further processing rather than using the simplified expression. That's wasteful. It also folds at elimination time, so this early folding is supposedly redundant iff we think the IL should be always fully folded (which is isn't, obviously). For PR118817 I've addressed this case in PRE. For the more general VN case it's a bit more difficult to do cleanly and definitely out-of-scope for stage4. I'll see what the fallout is when moving forwprop4 earlier (the late passes are oddly ordered IMO). There's also the pragmatic way of dealing with this in VN which is replacing the simplification attempt with in-place folding, but that's only OK when not iterating (or we're first-time visiting a stmt, but I'd rather not go there). There's unfortunately a difference between what fold_stmt and gimple_fold_stmt_to_constant does ... but maybe it does not matter ... turns out it does. All of the attempts have testsuite fallout, of course. Before r5-1495-g24314386b32b93 strlen was even earlier, but it was specifically placed before VRP. forwprop is currently specifically after DSE/DCE because the single-use gates benefit from DCEd IL. strlen OTOH is a source of constants and pruned memory ops so placing it before DSE/DCE makes sense. At r5-1495-g24314386b32b93 there wasn't a CCP after VRP, so it might be possible to move strlen a bit later (but then it will be after another jump threading...). Moving forwprop between pass_thread_jumps and pass_dominator does have quite some diagnostic fallout. Doing diff --git a/gcc/passes.def b/gcc/passes.def index 9fd85a35a63..c02fd0e186d 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -346,9 +346,10 @@ along with GCC; see the file COPYING3. If not see form if possible. */ NEXT_PASS (pass_thread_jumps, /*first=*/false); NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */); - NEXT_PASS (pass_strlen); NEXT_PASS (pass_thread_jumps_full, /*first=*/false); NEXT_PASS (pass_vrp, true /* final_p */); + NEXT_PASS (pass_forwprop, /*last=*/true); + NEXT_PASS (pass_strlen); /* Run CCP to compute alignment and nonzero bits. */ NEXT_PASS (pass_ccp, true /* nonzero_p */); NEXT_PASS (pass_warn_restrict); @@ -356,7 +357,6 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_dce, true /* update_address_taken_p */, true /* remove_unused_locals */); /* After late DCE we rewrite no longer addressed locals into SSA form if possible. */ - NEXT_PASS (pass_forwprop, /*last=*/true); NEXT_PASS (pass_sink_code, true /* unsplit edges */); NEXT_PASS (pass_phiopt, false /* early_p */); NEXT_PASS (pass_fold_builtins); An even more pragmatic approach is a single-level of folding uses (for changed defs) from SCCP. For full effect it would use a worklist and re-fold uses of defs of folded uses as well. Similar like simple_dce_from_worklist (which could also re-fold uses of defs that become single-use for example). diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc index 0ba85917d41..a0d1c2f3d86 100644 --- a/gcc/tree-scalar-evolution.cc +++ b/gcc/tree-scalar-evolution.cc @@ -284,6 +284,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-into-ssa.h" #include "builtins.h" #include "case-cfn-macros.h" +#include "tree-eh.h" static tree analyze_scalar_evolution_1 (class loop *, tree); static tree analyze_scalar_evolution_for_address_of (class loop *loop, @@ -3947,6 +3948,19 @@ final_value_replacement_loop (class loop *loop) print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (rslt), 0); fprintf (dump_file, "\n"); } + + if (! SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rslt)) + { + gimple *use_stmt; + imm_use_iterator imm_iter; + FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, rslt) + { + gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt); + if (!stmt_can_throw_internal (cfun, use_stmt) + && fold_stmt (&gsi, follow_all_ssa_edges)) + update_stmt (gsi_stmt (gsi)); + } + } } return any; this should have the least chance of regressing things. I'll report results.