https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98542

            Bug ID: 98542
           Summary: Redundant loads in vectorised loop
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rsandifo at gcc dot gnu.org
  Target Milestone: ---

For the testcase below, based loosely on one from 450.soplex,
the vectoriser loads the v and i fields twice:

struct s { double v; long i; };
double
f (struct s *x, double *y, int n)
{
  double res = 0;
  for (int i = 0; i < n; ++i)
    res += x[i].v * y[x[i].i];
  return res;
}

SVE loop:

        add     x5, x0, 8
        ...
.L4:
        ld2d    {z2.d - z3.d}, p0/z, [x5, x4, lsl 3]
        ld2d    {z4.d - z5.d}, p0/z, [x0, x4, lsl 3]
        ld1d    z2.d, p0/z, [x1, z2.d, lsl 3]
        incd    x3
        fmla    z0.d, p0/m, z4.d, z2.d
        incw    x4
        whilelo p0.d, w3, w2
        b.any   .L4

where z5 from the second ld2d is the same as z2
from the first ld2d.

In the soplex example, "i" is instead a 32-bit value,
but I guess we need to fix this case first.

Reply via email to