https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118521

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=118817

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
Looks similar to PR118817 btw.  Like that we're diagnosing from the strlen pass
which has a somewhat unfortunate pass position.

<bb 2> [local count: 131235111]:
_53 = operator new (2);

<bb 3> [local count: 131235111]:
MEM <unsigned char[2]> [(char * {ref-all})_53] = MEM <unsigned char[2]> [(char
* {ref-all})&C.0];
__result_46 = _53 + 2;
_150 = operator new (4);
goto <bb 5>; [100.00%]

<bb 5> [local count: 131235112]:
_97 = _150 + 2;
__builtin_memset (_97, 0, 2);
MEM <unsigned short> [(char * {ref-all})_150] = 513;
__result_274 = _150 + 1;
__new_finish_106 = __result_274 + 3;
operator delete (_53, 2);
_115 = _150 + 4;
if (__new_finish_106 != _115)
  goto <bb 6>; [82.57%]
else
  goto <bb 7>; [17.43%]

<bb 6> [local count: 108360832]:
MEM[(char *)_97 + 2B] = 1;

like in the other PR we are missing the power of forwprop which would have
accumulated the constant adjustments, eliding the BB6 enter condition.
As you say it's SCCP exposing the opportunity.

Neither FRE nor DOM have the ability to prove equivalence on larger
expressions like this, aka (_150 + 4) == ((_150 + 1) + 3) but they
instead rely on instruction combinations.

Now, FRE does "fold" each stmt, but tries to simplify it down to a
constant/copy
and if that's not possible goes with the original stmt for further processing
rather than using the simplified expression.  That's wasteful.  It also
folds at elimination time, so this early folding is supposedly redundant iff
we think the IL should be always fully folded (which is isn't, obviously).

For PR118817 I've addressed this case in PRE.  For the more general VN case
it's a bit more difficult to do cleanly and definitely out-of-scope for stage4.

I'll see what the fallout is when moving forwprop4 earlier (the late passes
are oddly ordered IMO).

There's also the pragmatic way of dealing with this in VN which is
replacing the simplification attempt with in-place folding, but that's
only OK when not iterating (or we're first-time visiting a stmt, but I'd
rather not go there).  There's unfortunately a difference between what
fold_stmt and gimple_fold_stmt_to_constant does ... but maybe it does not
matter ... turns out it does.

All of the attempts have testsuite fallout, of course.

Before r5-1495-g24314386b32b93 strlen was even earlier, but it was specifically
placed before VRP.  forwprop is currently specifically after DSE/DCE
because the single-use gates benefit from DCEd IL.  strlen OTOH is a
source of constants and pruned memory ops so placing it before DSE/DCE
makes sense.  At r5-1495-g24314386b32b93 there wasn't a CCP after VRP,
so it might be possible to move strlen a bit later (but then it will be
after another jump threading...).

Moving forwprop between pass_thread_jumps and pass_dominator does have
quite some diagnostic fallout.

Doing

diff --git a/gcc/passes.def b/gcc/passes.def
index 9fd85a35a63..c02fd0e186d 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -346,9 +346,10 @@ along with GCC; see the file COPYING3.  If not see
          form if possible.  */
       NEXT_PASS (pass_thread_jumps, /*first=*/false);
       NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
-      NEXT_PASS (pass_strlen);
       NEXT_PASS (pass_thread_jumps_full, /*first=*/false);
       NEXT_PASS (pass_vrp, true /* final_p */);
+      NEXT_PASS (pass_forwprop, /*last=*/true);
+      NEXT_PASS (pass_strlen);
       /* Run CCP to compute alignment and nonzero bits.  */
       NEXT_PASS (pass_ccp, true /* nonzero_p */);
       NEXT_PASS (pass_warn_restrict);
@@ -356,7 +357,6 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_dce, true /* update_address_taken_p */, true /*
remove_unused_locals */);
       /* After late DCE we rewrite no longer addressed locals into SSA
         form if possible.  */
-      NEXT_PASS (pass_forwprop, /*last=*/true);
       NEXT_PASS (pass_sink_code, true /* unsplit edges */);
       NEXT_PASS (pass_phiopt, false /* early_p */);
       NEXT_PASS (pass_fold_builtins);

An even more pragmatic approach is a single-level of folding uses (for
changed defs) from SCCP.  For full effect it would use a worklist and
re-fold uses of defs of folded uses as well.  Similar like
simple_dce_from_worklist (which could also re-fold uses of defs that
become single-use for example).

diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
index 0ba85917d41..a0d1c2f3d86 100644
--- a/gcc/tree-scalar-evolution.cc
+++ b/gcc/tree-scalar-evolution.cc
@@ -284,6 +284,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-into-ssa.h"
 #include "builtins.h"
 #include "case-cfn-macros.h"
+#include "tree-eh.h"

 static tree analyze_scalar_evolution_1 (class loop *, tree);
 static tree analyze_scalar_evolution_for_address_of (class loop *loop,
@@ -3947,6 +3948,19 @@ final_value_replacement_loop (class loop *loop)
          print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (rslt), 0);
          fprintf (dump_file, "\n");
        }
+
+      if (! SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rslt))
+       {
+         gimple *use_stmt;
+         imm_use_iterator imm_iter;
+         FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, rslt)
+           {
+             gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+             if (!stmt_can_throw_internal (cfun, use_stmt)
+                 && fold_stmt (&gsi, follow_all_ssa_edges))
+               update_stmt (gsi_stmt (gsi));
+           }
+       }
     }

   return any;

this should have the least chance of regressing things.  I'll report results.

Reply via email to