https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112387

            Bug ID: 112387
           Summary: RISC-V: failed to SLP INT64 gather load
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

https://godbolt.org/z/beq8TcGKe

Consider this following case:

void
f (int *restrict y, int *restrict x, 
int *restrict indices, int n)
{
  for (int64_t i = 0; i < n; ++i)
    {
      y[i * 2] = x[indices[i * 2]] + 1;
      y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
    }
}

Current RVV GCC can SLP:

        vsetvli zero,a5,e32,m1,ta,ma
        vle32.v v1,0(a2)
        vsetvli t4,zero,e64,m2,ta,ma
        vsext.vf2       v2,v1
        vsll.vi v2,v2,2
        vsetvli zero,a5,e32,m1,ta,ma
        vluxei64.v      v2,(a1),v2
        vsetvli t1,zero,e32,m1,ta,ma
        vadd.vv v2,v2,v4
        vsetvli zero,a5,e32,m1,ta,ma
        vse32.v v2,0(a0)
        add     a3,a3,t5
        add     a2,a2,a6
        add     a0,a0,a6
        bgtu    a7,a4,.L4

However if we change int -> uint64_t, it failed:

void
f2 (uint64_t *restrict y, uint64_t *restrict x, 
uint64_t *restrict indices, uint64_t n)
{
  for (int64_t i = 0; i < n; ++i)
    {
      y[i * 2] = x[indices[i * 2]] + 1;
      y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
    }
}

        vsetvli a5,a3,e64,m1,ta,ma
        vlseg2e64.v     v2,(a2)           -> unexpected
        slli    a4,a5,4
        vsll.vi v4,v2,3
        vsll.vi v1,v3,3
        vluxei64.v      v4,(a1),v4
        vluxei64.v      v1,(a1),v1
        vadd.vi v2,v4,1
        vadd.vi v3,v1,2
        sub     a3,a3,a5
        vsseg2e64.v     v2,(a0)           -> unexpected
        add     a2,a2,a4
        add     a0,a0,a4
        bne     a3,zero,.L10

ARM SVE is able to SLP both of them.

I was thinking it was fixed by this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635084.html

But turns out we are still missing something. It can only succeed on int,
failed on int64_t

Reply via email to