https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117557
--- Comment #7 from Tamar Christina <tnfchris at gcc dot gnu.org> --- The codegen in GCC 15 is: ld1b z30.h, p6/z, [x0] lsl z30.h, z30.h, #2 uunpkhi z29.s, z30.h uunpklo z30.s, z30.h ld1w z31.s, p3/z, [x23, z29.s, sxtw] ld1w z29.s, p7/z, [x23, z30.s, sxtw] st1w z29.s, p7, [x24, z12.s, sxtw] st1w z31.s, p7, [x24, z12.s, sxtw] but in GCC 14: ld1w {z31.s}, p5/z, [x23, z30.s, sxtw] ld1w {z29.s}, p4/z, [x23, z28.s, sxtw] st1w {z29.s}, p4, [x24, z12.s, sxtw] st1w {z31.s}, p5, [x3, z12.s, sxtw] It looks like the incorrect mask is used in some cases, it looks like when it has to unpack a vector it uses the same mask for every entry rather than the unpack mask. It also stores to the wrong address. It's storing to x24 twice. rather than x24 + VL. The GCC 15 code should be ld1w z31.s, p3/z, [x23, z29.s, sxtw] ld1w z29.s, p7/z, [x23, z30.s, sxtw] st1w z29.s, p7, [x24, z12.s, sxtw] addvl x3, x24, #2 st1w z31.s, p3, [x3, z12.s, sxtw] Looking where we mess it up. It looks like the vectorizer is using the wrong defs. I'm running cvise for a testcase.