https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84490

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #3)
> Actually r253993 was just the changelog part, r253975 was the actual change.
> 
> So I'm doing r254012 vs r254011 instead.

Base = r254011, Peak = r254012

                                  Estimated                       Estimated
                Base     Base       Base        Peak     Peak       Peak
Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
436.cactusADM   11950        185       64.6 *   11950        233       51.4 S
436.cactusADM   11950        185       64.6 S   11950        230       51.9 S
436.cactusADM   11950        185       64.6 S   11950        232       51.5 *

so confirmed.  Unsurprisingly:

 53.66%  cactusADM_peak.  cactusADM_peak.amd64-m64-gcc42-nn  [.]
bench_staggeredleapfrog2_                  
 43.53%  cactusADM_base.  cactusADM_base.amd64-m64-gcc42-nn  [.]
bench_staggeredleapfrog2_         

and the difference is that for the fast version we peel for alignment.  And
we're probably lucky in that we end up aligning all stores given they are

      do k = 2,nz-1
...
         do j=2,ny-1
            do i=2,nx-1
...
               ADM_kxx_stag(i,j,k) = ADM_kxx_stag_p(i,j,k)+
     &              dkdt_dkxxdt*dt
...
               ADM_kxy_stag(i,j,k) = ADM_kxy_stag_p(i,j,k)+
     &              dkdt_dkxydt*dt
...
               ADM_kxz_stag(i,j,k) = ADM_kxz_stag_p(i,j,k)+
     &              dkdt_dkxzdt*dt
...
               ADM_kyy_stag(i,j,k) = ADM_kyy_stag_p(i,j,k)+
     &              dkdt_dkyydt*dt
...
               ADM_kyz_stag(i,j,k) = ADM_kyz_stag_p(i,j,k)+
     &              dkdt_dkyzdt*dt
...
               ADM_kzz_stag(i,j,k) = ADM_kzz_stag_p(i,j,k)+
     &              dkdt_dkzzdt*dt
            end do
         end do
      end do

all arrays have the same shape (but that fact isn't exposed in the IL)
and assuming same alignment of the incoming pointers we could have
"guessed" we are actually aligning more than one store.

Both variants spill _a lot_ (but hopefully to aligned stack slots),
so we are most probably store-bound here (without actually verifying).

Disabling peeling for alignment on r254011 results in

436.cactusADM   11950        207       57.7 *    

so that's only half-way.  Disabling peeling for alignment on r254012 does
nothing (as expected).  So there's more than just peeling for alignment
here.

Reply via email to