https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122445

            Bug ID: 122445
           Summary: Wrong code since r16-4391-g85ab3a22ed11c9
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rdapp at gcc dot gnu.org
                CC: vineetg at rivosinc dot com
  Target Milestone: ---
            Target: riscv

Since the introduction of grouped gather x264 fails with zvl256b and
-mrvv-vector-bits=zvl.

I tracked it down to this loop:

    for( int y = 0; y < 4; y++ )
    {
        for( int x = 0; x < 4; x++ )
            p_dst[x] = x264_clip_uint8( p_dst[x] + d[y*4+x] );
        p_dst += FDEC_STRIDE;
    }

compiled with -O3 -march=rv64gcv_zvl256b -mrvv-vector-bits=zvl
-mtune=generic-ooo and we use a strided store since the patch.

While the gather/strided code is reasonable the issue actually lies somewhere
else:  In avlprop we try to backpropagate the AVL (vector length) value from
the store, the consumer, to the producer.  We didn't expect the use in the
consumer to be in subreg context, though, which is exactly what's happening now
with the grouped-gather changer.  Thus, we use the AVL of the punned mode
rather than the original mode which has 4x more elements, propagate this to all
other instructions and effectively only operate on one element instead of four.

The solution is to account for subreg uses and scale the AVL appropriately in
avlprop.

Reply via email to