https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107916
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> --- aarch64 has a similar issue too: .L3: add w1, w1, 1 add v0.4s, v5.4s, v2.4s add v1.4s, v4.4s, v3.4s mov v2.16b, v0.16b mov v3.16b, v1.16b cmp w0, w1 bne .L3 Though not as bad as it is just extra moves inside the loop as there is OI mode there ... . This is a generic vect lowering issue I think.