https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122445
Bug ID: 122445
Summary: Wrong code since r16-4391-g85ab3a22ed11c9
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: rdapp at gcc dot gnu.org
CC: vineetg at rivosinc dot com
Target Milestone: ---
Target: riscv
Since the introduction of grouped gather x264 fails with zvl256b and
-mrvv-vector-bits=zvl.
I tracked it down to this loop:
for( int y = 0; y < 4; y++ )
{
for( int x = 0; x < 4; x++ )
p_dst[x] = x264_clip_uint8( p_dst[x] + d[y*4+x] );
p_dst += FDEC_STRIDE;
}
compiled with -O3 -march=rv64gcv_zvl256b -mrvv-vector-bits=zvl
-mtune=generic-ooo and we use a strided store since the patch.
While the gather/strided code is reasonable the issue actually lies somewhere
else: In avlprop we try to backpropagate the AVL (vector length) value from
the store, the consumer, to the producer. We didn't expect the use in the
consumer to be in subreg context, though, which is exactly what's happening now
with the grouped-gather changer. Thus, we use the AVL of the punned mode
rather than the original mode which has 4x more elements, propagate this to all
other instructions and effectively only operate on one element instead of four.
The solution is to account for subreg uses and scale the AVL appropriately in
avlprop.