https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112387
Bug ID: 112387 Summary: RISC-V: failed to SLP INT64 gather load Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: juzhe.zhong at rivai dot ai Target Milestone: --- https://godbolt.org/z/beq8TcGKe Consider this following case: void f (int *restrict y, int *restrict x, int *restrict indices, int n) { for (int64_t i = 0; i < n; ++i) { y[i * 2] = x[indices[i * 2]] + 1; y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2; } } Current RVV GCC can SLP: vsetvli zero,a5,e32,m1,ta,ma vle32.v v1,0(a2) vsetvli t4,zero,e64,m2,ta,ma vsext.vf2 v2,v1 vsll.vi v2,v2,2 vsetvli zero,a5,e32,m1,ta,ma vluxei64.v v2,(a1),v2 vsetvli t1,zero,e32,m1,ta,ma vadd.vv v2,v2,v4 vsetvli zero,a5,e32,m1,ta,ma vse32.v v2,0(a0) add a3,a3,t5 add a2,a2,a6 add a0,a0,a6 bgtu a7,a4,.L4 However if we change int -> uint64_t, it failed: void f2 (uint64_t *restrict y, uint64_t *restrict x, uint64_t *restrict indices, uint64_t n) { for (int64_t i = 0; i < n; ++i) { y[i * 2] = x[indices[i * 2]] + 1; y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2; } } vsetvli a5,a3,e64,m1,ta,ma vlseg2e64.v v2,(a2) -> unexpected slli a4,a5,4 vsll.vi v4,v2,3 vsll.vi v1,v3,3 vluxei64.v v4,(a1),v4 vluxei64.v v1,(a1),v1 vadd.vi v2,v4,1 vadd.vi v3,v1,2 sub a3,a3,a5 vsseg2e64.v v2,(a0) -> unexpected add a2,a2,a4 add a0,a0,a4 bne a3,zero,.L10 ARM SVE is able to SLP both of them. I was thinking it was fixed by this patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635084.html But turns out we are still missing something. It can only succeed on int, failed on int64_t