https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118297
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |WAITING Last reconfirmed| |2025-01-07 Ever confirmed|0 |1 --- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> --- Oh, and yes - we fail to "thread" the inner (what you identify as outer) loop j == 0 check, so we fail to realize the inner loop body rolls only once. We're doing this later. Possibly loop header copying could realize this - we had improvements to catch these kind of cases there, but possibly number of iteration analysis needs to be improved here. We also refuse to loop-header copy this because there's a pow() call in the block. The thread2 pass after loop the finally elides one of the loops, but the j == 0 check remains and is only elided by threadfull2 which has all loops removed. We do apply SLP vectorization with -march=znver3 so I wonder what you think we are missing (apart from the confusing -fopt-info-missed messages)?