https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117804
Bug ID: 117804 Summary: RISC-V: Worse codegen in mc_chroma of x264 Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: juzhe.zhong at rivai dot ai Target Milestone: --- #include <stdint.h> #include <math.h> void mc_chroma( uint8_t *dst, int i_dst_stride, uint8_t *src, int i_src_stride, int mvx, int mvy, int i_width, int i_height ) { uint8_t *srcp; int d8x = mvx&0x07; int d8y = mvy&0x07; int cA = (8-d8x)*(8-d8y); int cB = d8x *(8-d8y); int cC = (8-d8x)*d8y; int cD = d8x *d8y; src += (mvy >> 3) * i_src_stride + (mvx >> 3); srcp = &src[i_src_stride]; for( int y = 0; y < i_height; y++ ) { for( int x = 0; x < i_width; x++ ) dst[x] = ( cA*src[x] + cB*src[x+1] + cC*srcp[x] + cD*srcp[x+1] + 32 ) >> 6; dst += i_dst_stride; src = srcp; srcp += i_src_stride; } } https://godbolt.org/z/6xncTjo88 gcc: vzext.vf2 v8,v4 vzext.vf2 v6,v3 vzext.vf2 v4,v2 vmadd.vv v8,v16,v18 vzext.vf2 v2,v1 vmadd.vv v6,v14,v8 vmadd.vv v4,v12,v6 vmadd.vv v2,v10,v4 Clang: vwmulu.vx v16, v8, s7 vwmulu.vx v20, v12, t3 vwmaccu.vx v20, t2, v14 vwmaccu.vx v16, s8, v10 Ideally, we should be able combine instruction into vwmacc and transform vmv.v.x vx instructions.