https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117804

            Bug ID: 117804
           Summary: RISC-V: Worse codegen in mc_chroma of x264
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

#include <stdint.h>
#include <math.h>
void mc_chroma( uint8_t *dst, int i_dst_stride,
                       uint8_t *src, int i_src_stride,
                       int mvx, int mvy,
                       int i_width, int i_height )
{
    uint8_t *srcp;

    int d8x = mvx&0x07;
    int d8y = mvy&0x07;
    int cA = (8-d8x)*(8-d8y);
    int cB = d8x    *(8-d8y);
    int cC = (8-d8x)*d8y;
    int cD = d8x    *d8y;

    src += (mvy >> 3) * i_src_stride + (mvx >> 3);
    srcp = &src[i_src_stride];

    for( int y = 0; y < i_height; y++ )
    {
        for( int x = 0; x < i_width; x++ )
            dst[x] = ( cA*src[x]  + cB*src[x+1] + cC*srcp[x] + cD*srcp[x+1] +
32 ) >> 6;
        dst  += i_dst_stride;
        src   = srcp;
        srcp += i_src_stride;
    }
}

https://godbolt.org/z/6xncTjo88

gcc:

        vzext.vf2       v8,v4
        vzext.vf2       v6,v3
        vzext.vf2       v4,v2
        vmadd.vv        v8,v16,v18
        vzext.vf2       v2,v1
        vmadd.vv        v6,v14,v8
        vmadd.vv        v4,v12,v6
        vmadd.vv        v2,v10,v4

Clang:
        vwmulu.vx       v16, v8, s7
        vwmulu.vx       v20, v12, t3
        vwmaccu.vx      v20, t2, v14
        vwmaccu.vx      v16, s8, v10

Ideally, we should be able combine instruction into vwmacc and transform
vmv.v.x 
vx instructions.

Reply via email to