https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103061
--- Comment #7 from Aldy Hernandez <aldyh at gcc dot gnu.org> --- Simplified version without the noise: <bb 35> [local count: 56063504182]: _134 = M.10_120 + 1; if (_71 <= _134) goto <bb 19>; [11.00%] else goto <bb 41>; [89.00%] ... ... <bb 41> [local count: 49896518755]: <bb 20> [local count: 56063503181]: # lb_75 = PHI <_134(41), 1(18)> _117 = mstep_49 + lb_75; _118 = _117 + -1; _119 = mstep_49 + _118; M.10_120 = MIN_EXPR <_119, _71>; if (lb_75 > M.10_120) goto <bb 21>; [11.00%] else goto <bb 22>; [89.00%] If _134 can be assumed to be greater than M.10_120, then we can thread 41->20->21.