https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118380
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2025-01-09 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- Well, LLVM likely unrolls all loops while we don't, so constant propagation from the initializer doesn't work. With --param max-completely-peeled-insns=1000 we produce test256: .LFB7779: .cfi_startproc vmovss .LC0(%rip), %xmm0 ret which is better than clang which fails to eliminate an empty loop. I think this works as intended (limiting code growth and compile-time, heuristically - obviously not realizing the full followup optimization). The __builtin_ia32_vbroadcastss256 call is of course a blocker, confirmed for that part.