https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84490
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Richard Biener from comment #3) > Actually r253993 was just the changelog part, r253975 was the actual change. > > So I'm doing r254012 vs r254011 instead. Base = r254011, Peak = r254012 Estimated Estimated Base Base Base Peak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio 436.cactusADM 11950 185 64.6 * 11950 233 51.4 S 436.cactusADM 11950 185 64.6 S 11950 230 51.9 S 436.cactusADM 11950 185 64.6 S 11950 232 51.5 * so confirmed. Unsurprisingly: 53.66% cactusADM_peak. cactusADM_peak.amd64-m64-gcc42-nn [.] bench_staggeredleapfrog2_ 43.53% cactusADM_base. cactusADM_base.amd64-m64-gcc42-nn [.] bench_staggeredleapfrog2_ and the difference is that for the fast version we peel for alignment. And we're probably lucky in that we end up aligning all stores given they are do k = 2,nz-1 ... do j=2,ny-1 do i=2,nx-1 ... ADM_kxx_stag(i,j,k) = ADM_kxx_stag_p(i,j,k)+ & dkdt_dkxxdt*dt ... ADM_kxy_stag(i,j,k) = ADM_kxy_stag_p(i,j,k)+ & dkdt_dkxydt*dt ... ADM_kxz_stag(i,j,k) = ADM_kxz_stag_p(i,j,k)+ & dkdt_dkxzdt*dt ... ADM_kyy_stag(i,j,k) = ADM_kyy_stag_p(i,j,k)+ & dkdt_dkyydt*dt ... ADM_kyz_stag(i,j,k) = ADM_kyz_stag_p(i,j,k)+ & dkdt_dkyzdt*dt ... ADM_kzz_stag(i,j,k) = ADM_kzz_stag_p(i,j,k)+ & dkdt_dkzzdt*dt end do end do end do all arrays have the same shape (but that fact isn't exposed in the IL) and assuming same alignment of the incoming pointers we could have "guessed" we are actually aligning more than one store. Both variants spill _a lot_ (but hopefully to aligned stack slots), so we are most probably store-bound here (without actually verifying). Disabling peeling for alignment on r254011 results in 436.cactusADM 11950 207 57.7 * so that's only half-way. Disabling peeling for alignment on r254012 does nothing (as expected). So there's more than just peeling for alignment here.