https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84646
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> --- commit 837be6c7cfb49e16a18ef8f6c44d98bfa6d2396b Author: Richard Biener <rguent...@suse.de> Date: Wed Nov 9 13:52:58 2022 +0100 tree-optimization/84646 - remove premature thread path rejection This removes a premature rejection that's done later in a different way. PR tree-optimization/84646 * tree-ssa-threadbackward.cc (back_threader::maybe_register_path): Remove premature cycle rejection. The last threadfull pass now performs the desired threading but we lack a later pass that elides the endless loop that remains: <bb 9> [local count: 477815113]: # sum_10 = PHI <sum_51(7), sum_10(9)> # ivtmp.9_24 = PHI <ivtmp.9_53(7), ivtmp.9_31(9)> ivtmp.9_31 = ivtmp.9_24 + 4; if (_15 != ivtmp.9_31) goto <bb 9>; [89.00%] else goto <bb 10>; [11.00%] <bb 10> [local count: 118111600]: # sum_33 = PHI <sum_10(9), sum_35(11), 20000(6), sum_27(8)> # running_37 = PHI <0(9), running_38(11), 0(6), running_38(8)> the loop isn't removed by DCE because sum_10 is needed. This case looks like a genuine missed copy propagation or value numbering since the value is always equal to sum_51. But after threadfull2 we have none of those. VRP is no longer doing copy propagation, we end up with <bb 11> [local count: 477815113]: # sum_10 = PHI <sum_51(8), sum_48(12)> # ivtmp.9_24 = PHI <ivtmp.9_53(8), ivtmp.9_50(12)> ivtmp.9_31 = ivtmp.9_24 + 4; if (_15 != ivtmp.9_31) goto <bb 12>; [89.00%] else goto <bb 13>; [11.00%] <bb 12> [local count: 425255451]: # sum_48 = PHI <sum_10(11)> # ivtmp.9_50 = PHI <ivtmp.9_31(11)> goto <bb 11>; [100.00%] <bb 13> [local count: 118111600]: # sum_33 = PHI <sum_10(11), sum_35(14), 20000(6), sum_27(9)> there. A copyprop pass doesn't handle this degenerate case, non-iterating FRE neither, nor iterating FRE. Both CCP and FRE fall into the trap of starting sum_10 as 20000 and on iteration the above makes sum_10 varying. FRE would handle the first quoted IL with sum_48 removed though (even when not iterating). Currently it's forwprop that turns the 2nd into the first by means of copy propagating. The idea was that VRP would do the job fully clearing out copies but appearantly that no longer happens. We've had copy_prop in place of CCP but CCP doesn't cleanup this singleton PHI copy, investigating why.