https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111451
Bug ID: 111451 Summary: RISC-V: Missed optimization of vrgather.vv into vrgatherei16.vv Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: juzhe.zhong at rivai dot ai Target Milestone: --- Consider this following case: #include <stdint.h> typedef int32_t vnx32si __attribute__ ((vector_size (128))); #define MASK_2(X, Y) (Y) - 1 - (X), (Y) - 2 - (X) #define MASK_4(X, Y) MASK_2 (X, Y), MASK_2 (X + 2, Y) #define MASK_8(X, Y) MASK_4 (X, Y), MASK_4 (X + 4, Y) #define MASK_16(X, Y) MASK_8 (X, Y), MASK_8 (X + 8, Y) #define MASK_32(X, Y) MASK_16 (X, Y), MASK_16 (X + 16, Y) #define MASK_64(X, Y) MASK_32 (X, Y), MASK_32 (X + 32, Y) #define MASK_128(X, Y) MASK_64 (X, Y), MASK_64 (X + 64, Y) #define PERMUTE(TYPE, NUNITS) \ __attribute__ ((noipa)) void permute_##TYPE (TYPE values1, TYPE values2, \ TYPE *out) \ { \ TYPE v \ = __builtin_shufflevector (values1, values2, MASK_##NUNITS (0, NUNITS)); \ *(TYPE *) out = v; \ } #define TEST_ALL(T) \ T (vnx32si, 32) \ TEST_ALL (PERMUTE) ASM: permute_vnx32si: li a5,32 li a4,31 vsetvli zero,a5,e32,m8,ta,ma vid.v v8 vle32.v v24,0(a0) vrsub.vx v8,v8,a4 vrgather.vv v16,v24,v8 vse32.v v16,0(a2) ret https://godbolt.org/z/Mh77YY91r Here we use: vsetvli zero,a5,e32,m8,ta,ma ... vrgather.vv v16,v24,v8 The index vector register "v8" occupies 8 registers. We should optimize it into vrgatherei16.vv which is using int16 as the index elements. Then with vrgatherei16.vv, the v8 will occupy 4 registers instead of 8. Lower the register consuming and register pressure.