https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117557

--- Comment #7 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
The codegen in GCC 15 is:

ld1b    z30.h, p6/z, [x0]
lsl     z30.h, z30.h, #2
uunpkhi z29.s, z30.h
uunpklo z30.s, z30.h
ld1w    z31.s, p3/z, [x23, z29.s, sxtw]
ld1w    z29.s, p7/z, [x23, z30.s, sxtw]
st1w    z29.s, p7, [x24, z12.s, sxtw]
st1w    z31.s, p7, [x24, z12.s, sxtw]

but in GCC 14:

ld1w    {z31.s}, p5/z, [x23, z30.s, sxtw]
ld1w    {z29.s}, p4/z, [x23, z28.s, sxtw]
st1w    {z29.s}, p4, [x24, z12.s, sxtw]
st1w    {z31.s}, p5, [x3, z12.s, sxtw]

It looks like the incorrect mask is used in some cases, it looks like when it
has to unpack a vector it uses the same mask for every entry rather than the
unpack mask.

It also stores to the wrong address. It's storing to x24 twice. rather than x24
+ VL.

The GCC 15 code should be

ld1w    z31.s, p3/z, [x23, z29.s, sxtw]
ld1w    z29.s, p7/z, [x23, z30.s, sxtw]
st1w    z29.s, p7, [x24, z12.s, sxtw]
addvl   x3, x24, #2
st1w    z31.s, p3, [x3, z12.s, sxtw]

Looking where we mess it up. It looks like the vectorizer is using the wrong
defs.

I'm running cvise for a testcase.

Reply via email to