https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Little bit convoluted testcase:

double a[1024];

int bar();
void foo (int n)
{
  double x = 0, y = 0;
  int i = 1023;
  do
    {
      x += a[i] + a[i+1];
      y += a[i] / a[i+1];
      if (bar ())
        break;
    }
  while (--i);
  a[0] = x;
  a[1] = y;
}

where we end up with the {x, y} vector CTOR inside the loop (and even
spill/reload it because of the call).  We have a PHI node-only feed
for the vectorized store:

t.c:16:8: note: Vectorizing SLP tree:
t.c:16:8: note: node 0x3b21ee0 (max_nunits=2, refcnt=1)
t.c:16:8: note: op template: a[0] = x_22;
t.c:16:8: note:         stmt 0 a[0] = x_22;
t.c:16:8: note:         stmt 1 a[1] = y_21;
t.c:16:8: note:         children 0x3b21f68
t.c:16:8: note: node 0x3b21f68 (max_nunits=2, refcnt=1)
t.c:16:8: note: op template: x_22 = PHI <x_26(9), x_25(10)>
t.c:16:8: note:         stmt 0 x_22 = PHI <x_26(9), x_25(10)>
t.c:16:8: note:         stmt 1 y_21 = PHI <y_24(9), y_23(10)>
t.c:16:8: note:         children 0x3b21ff0 0x3b22210
t.c:16:8: note: node 0x3b21ff0 (max_nunits=2, refcnt=1)
t.c:16:8: note: op template: x_26 = PHI <x_14(3)>
t.c:16:8: note:         stmt 0 x_26 = PHI <x_14(3)>
t.c:16:8: note:         stmt 1 y_24 = PHI <y_15(3)>
t.c:16:8: note:         children 0x3b22320
t.c:16:8: note: node (external) 0x3b22320 (max_nunits=1, refcnt=1)
t.c:16:8: note:         { x_14, y_15 }
t.c:16:8: note: node 0x3b22210 (max_nunits=2, refcnt=1)
t.c:16:8: note: op template: x_25 = PHI <x_14(4)>
t.c:16:8: note:         stmt 0 x_25 = PHI <x_14(4)>
t.c:16:8: note:         stmt 1 y_23 = PHI <y_15(4)>
t.c:16:8: note:         children 0x3b223a8
t.c:16:8: note: node (external) 0x3b223a8 (max_nunits=1, refcnt=1)
t.c:16:8: note:         { x_14, y_15 }

fixing this issue fixes the slowdown.  Testing a patch.

Reply via email to