https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84646

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
commit 837be6c7cfb49e16a18ef8f6c44d98bfa6d2396b
Author: Richard Biener <rguent...@suse.de>
Date:   Wed Nov 9 13:52:58 2022 +0100

    tree-optimization/84646 - remove premature thread path rejection

    This removes a premature rejection that's done later in a different
    way.

            PR tree-optimization/84646
            * tree-ssa-threadbackward.cc (back_threader::maybe_register_path):
            Remove premature cycle rejection.


The last threadfull pass now performs the desired threading but we lack
a later pass that elides the endless loop that remains:

<bb 9> [local count: 477815113]:
# sum_10 = PHI <sum_51(7), sum_10(9)>
# ivtmp.9_24 = PHI <ivtmp.9_53(7), ivtmp.9_31(9)>
ivtmp.9_31 = ivtmp.9_24 + 4;
if (_15 != ivtmp.9_31)
  goto <bb 9>; [89.00%]
else
  goto <bb 10>; [11.00%]

<bb 10> [local count: 118111600]:
# sum_33 = PHI <sum_10(9), sum_35(11), 20000(6), sum_27(8)>
# running_37 = PHI <0(9), running_38(11), 0(6), running_38(8)>

the loop isn't removed by DCE because sum_10 is needed.  This case looks
like a genuine missed copy propagation or value numbering since the
value is always equal to sum_51.  But after threadfull2 we have none
of those.  VRP is no longer doing copy propagation, we end up with

  <bb 11> [local count: 477815113]:
  # sum_10 = PHI <sum_51(8), sum_48(12)>
  # ivtmp.9_24 = PHI <ivtmp.9_53(8), ivtmp.9_50(12)>
  ivtmp.9_31 = ivtmp.9_24 + 4;
  if (_15 != ivtmp.9_31)
    goto <bb 12>; [89.00%]
  else
    goto <bb 13>; [11.00%]

  <bb 12> [local count: 425255451]:
  # sum_48 = PHI <sum_10(11)>
  # ivtmp.9_50 = PHI <ivtmp.9_31(11)>
  goto <bb 11>; [100.00%]

  <bb 13> [local count: 118111600]:
  # sum_33 = PHI <sum_10(11), sum_35(14), 20000(6), sum_27(9)>

there.  A copyprop pass doesn't handle this degenerate case, non-iterating
FRE neither, nor iterating FRE.  Both CCP and FRE fall into the trap
of starting sum_10 as 20000 and on iteration the above makes sum_10
varying.  FRE would handle the first quoted IL with sum_48 removed though
(even when not iterating).  Currently it's forwprop that turns the 2nd
into the first by means of copy propagating.  The idea was that VRP would
do the job fully clearing out copies but appearantly that no longer happens.

We've had copy_prop in place of CCP but CCP doesn't cleanup this singleton
PHI copy, investigating why.

Reply via email to