https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92335
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2019-11-04 CC| |rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- The issue is probably some FP constraints that say we cannot elide ret += 0.0, otherwise we'd try to do that resulting in branchy code for foo as well. If you add -ffast-math to -O2 you'll see exactly that behavior - we're presenting RTL expansion with <bb 3> [local count: 1063004407]: # ret_19 = PHI <0.0(2), prephitmp_25(5)> # ivtmp.13_7 = PHI <0(2), ivtmp.13_4(5)> k_12 = MEM[base: y_10(D), index: ivtmp.13_7, offset: 0B]; _6 = MEM[base: x_13(D), index: ivtmp.13_7, offset: 0B]; if (_6 > 0.0) goto <bb 4>; [59.00%] else goto <bb 5>; [41.00%] <bb 4> [local count: 627172604]: _24 = k_12 + ret_19; <bb 5> [local count: 1063004407]: # prephitmp_25 = PHI <_24(4), ret_19(3)> ivtmp.13_4 = ivtmp.13_7 + 4; if (ivtmp.13_4 == 4096) goto <bb 6>; [1.01%] else goto <bb 3>; [98.99%] while without -ffast-math 'foo' has retained the unconditional accumulation. Since RTL optimization chickens out on most FP involved transforms I'm not surprised it doesn't try to undo this. We're leaving most if-conversion to RTL because it has a better idea of target costs.