https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70138
--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> --- Simpler testcase not requiring strided stores: double u[33]; __attribute__((noinline, noclone)) static void foo (int *x) { double c = 0.0; int a, b; for (a = 0; a < 33; a++) { for (b = 0; b < 33; b++) c = c + u[a]; u[a] *= 2.0; } *x = c; } int main () { int d, e; for (d = 0; d < 33; d++) { u[d] = (d + 2); __asm__ volatile ("" : : : "memory"); } foo (&e); if (e != 33 * (2 + 34) / 2 * 33) __builtin_abort (); return 0; } fails to vectorize on the gcc-5-branch because it can't create an epilogue there.