https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- Little bit convoluted testcase: double a[1024]; int bar(); void foo (int n) { double x = 0, y = 0; int i = 1023; do { x += a[i] + a[i+1]; y += a[i] / a[i+1]; if (bar ()) break; } while (--i); a[0] = x; a[1] = y; } where we end up with the {x, y} vector CTOR inside the loop (and even spill/reload it because of the call). We have a PHI node-only feed for the vectorized store: t.c:16:8: note: Vectorizing SLP tree: t.c:16:8: note: node 0x3b21ee0 (max_nunits=2, refcnt=1) t.c:16:8: note: op template: a[0] = x_22; t.c:16:8: note: stmt 0 a[0] = x_22; t.c:16:8: note: stmt 1 a[1] = y_21; t.c:16:8: note: children 0x3b21f68 t.c:16:8: note: node 0x3b21f68 (max_nunits=2, refcnt=1) t.c:16:8: note: op template: x_22 = PHI <x_26(9), x_25(10)> t.c:16:8: note: stmt 0 x_22 = PHI <x_26(9), x_25(10)> t.c:16:8: note: stmt 1 y_21 = PHI <y_24(9), y_23(10)> t.c:16:8: note: children 0x3b21ff0 0x3b22210 t.c:16:8: note: node 0x3b21ff0 (max_nunits=2, refcnt=1) t.c:16:8: note: op template: x_26 = PHI <x_14(3)> t.c:16:8: note: stmt 0 x_26 = PHI <x_14(3)> t.c:16:8: note: stmt 1 y_24 = PHI <y_15(3)> t.c:16:8: note: children 0x3b22320 t.c:16:8: note: node (external) 0x3b22320 (max_nunits=1, refcnt=1) t.c:16:8: note: { x_14, y_15 } t.c:16:8: note: node 0x3b22210 (max_nunits=2, refcnt=1) t.c:16:8: note: op template: x_25 = PHI <x_14(4)> t.c:16:8: note: stmt 0 x_25 = PHI <x_14(4)> t.c:16:8: note: stmt 1 y_23 = PHI <y_15(4)> t.c:16:8: note: children 0x3b223a8 t.c:16:8: note: node (external) 0x3b223a8 (max_nunits=1, refcnt=1) t.c:16:8: note: { x_14, y_15 } fixing this issue fixes the slowdown. Testing a patch.