https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111905
--- Comment #4 from Kamil Kaznowski <kamkaz at windowslive dot com> --- (In reply to Andrew Pinski from comment #2) > Why are you using `-mprefer-vector-width=512` here? > > 512 causes the loop to be needing to be unrolled once more and that is why > the confusion happening. I don't think the preferred vector width mattered there? The vector width used was already 512 (zmm registers). I created a simpler example, no extra flags affecting preferred vector width, chunks of 256 bit (8*32bit). Still the same issue appears - a lot of weirdly-unrolled code, that I presume (hope?) is dead + some extra code at the start I can't quite figure out.