------- Comment #4 from jv244 at cam dot ac dot uk  2007-03-02 09:55 -------
(In reply to comment #3)
> On my "AMD Athlon(tm) 64 X2 Dual Core Processor 4800+", gfortran is in x86_64
> mode only 13% slower:
> gfortran: Kernel time 5.872366, real 0m33.121s; user 0m32.898s; sys 0m0.088s.
> Ifort:    Kernel time 5.244328, real 0m28.893s, user 0m28.758s, sys 0m0.076s.
> Options: "ifort -xP -O3 -xW -free" and "gfortran -O3 -march=native -ffast-math
> -ffree-form -ftree-vectorize -funroll-loops".
> 
> For grid_fast.F, one difference is which loops are vectorized; ifort 
> vectorizes
> the loops in line 44, 469, 483 and 496, gfortran only vectorizes the loops in
> line 496 and 469; for the other ones:
> 
> grid_fast.F:44: note: not vectorized: complicated access pattern.
>           DO lz=1,lz_max(lxy)
>              lxyz=lxyz+1
>              pyx(1,lxy)=pyx(1,lxy)+pzyx(lxyz)*polz(lxyz,kg)
>              pyx(2,lxy)=pyx(2,lxy)+pzyx(lxyz)*polz(lxyz,kg2)
>           ENDDO

this might matter a bit, but this is not in an inner loop, so I don't think it
accounts for a lot of time. Having it vectorized would be good of course.

> 
> grid_fast.F:483: note: not vectorized: can't determine dependence between
> (*coef_447)[D.1967_2320] and (*coef_447)[D.1967_2320]
>               DO icoef=1,coef_max
>                  coef(icoef,1)=coef(icoef,1)+alpha(icoef,lx)*g1
>                  coef(icoef,2)=coef(icoef,2)+alpha(icoef,lx)*g2
>                  coef(icoef,3)=coef(icoef,3)+alpha(icoef,lx)*g1k
>                  coef(icoef,4)=coef(icoef,4)+alpha(icoef,lx)*g2k
>               ENDDO
> 

This part, which is in the default part of the switch statement should only be
executed in rare cases. I doubt it matters much in the overall timings. Also,
this loop has very short trips (i.e. coef_max should, for the provided input,
be at most 5).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31021

Reply via email to