[Bug tree-optimization/115438] [15 Regression] 503.bwaves_r regressed 5-11% on different x86_64 machines at -Ofast -march=native since r15-1006-gd93353e6423eca

rguenth at gcc dot gnu.org via Gcc-bugs Thu, 13 Jun 2024 06:08:19 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115438


--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Another difference is that for

C       Local r*r norm
        r2=0.0
        do k=2,nzl+1
           do j=1,ny
              do i=1,nx
                 do l=1,nb
                    r(l,i,j,k) = b(l,i,j,k-1) - r(l,i,j,k)
                    r2 =r2+r(l,i,j,k)**2
                    rhat(l,i,j,k) = r(l,i,j,k)
                 enddo
              enddo
           enddo
        enddo

we're now ending up with hybrid SLP (SLP for the reduction and non-SLP
for the non-grouped stores).  In the end in .optimized the code looks
the same again though.

That's expected and will resolve itself.

Another difference is that without SLP we prefer to use a neutral element
as reduction init while with SLP we prefer the scalar initial values
as that's more efficient for SLP reductions and it might also reduce
lifetime of the reg holding the initial value.  I doubt this to be
the reason for the slowness, but it at least prevails.

[Bug tree-optimization/115438] [15 Regression] 503.bwaves_r regressed 5-11% on different x86_64 machines at -Ofast -march=native since r15-1006-gd93353e6423eca

Reply via email to