https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579
--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> --- Transform second loop as diff --git a/loop.c b/loop.c index feea9ea..81a3ea6 100644 --- a/loop.c +++ b/loop.c @@ -9,6 +9,6 @@ loop (int k, double x) for (i=0;i<6;i++) r[i] = x * a[i + k]; for (i=0;i<6;i++) - t+=r[5-i]; + t+=r[i]; -------- using ascending order, align with former loop. return t; } } Can avoid store forward stalls. Before loop transform: loop_avx256: 3710992 loop : 671995 loop_avx128: 650882 After loop transform: loop_avx256: 661386 loop : 652932 loop_avx128: 568710