------- Comment #4 from jv244 at cam dot ac dot uk 2007-03-02 09:55 ------- (In reply to comment #3) > On my "AMD Athlon(tm) 64 X2 Dual Core Processor 4800+", gfortran is in x86_64 > mode only 13% slower: > gfortran: Kernel time 5.872366, real 0m33.121s; user 0m32.898s; sys 0m0.088s. > Ifort: Kernel time 5.244328, real 0m28.893s, user 0m28.758s, sys 0m0.076s. > Options: "ifort -xP -O3 -xW -free" and "gfortran -O3 -march=native -ffast-math > -ffree-form -ftree-vectorize -funroll-loops". > > For grid_fast.F, one difference is which loops are vectorized; ifort > vectorizes > the loops in line 44, 469, 483 and 496, gfortran only vectorizes the loops in > line 496 and 469; for the other ones: > > grid_fast.F:44: note: not vectorized: complicated access pattern. > DO lz=1,lz_max(lxy) > lxyz=lxyz+1 > pyx(1,lxy)=pyx(1,lxy)+pzyx(lxyz)*polz(lxyz,kg) > pyx(2,lxy)=pyx(2,lxy)+pzyx(lxyz)*polz(lxyz,kg2) > ENDDO
this might matter a bit, but this is not in an inner loop, so I don't think it accounts for a lot of time. Having it vectorized would be good of course. > > grid_fast.F:483: note: not vectorized: can't determine dependence between > (*coef_447)[D.1967_2320] and (*coef_447)[D.1967_2320] > DO icoef=1,coef_max > coef(icoef,1)=coef(icoef,1)+alpha(icoef,lx)*g1 > coef(icoef,2)=coef(icoef,2)+alpha(icoef,lx)*g2 > coef(icoef,3)=coef(icoef,3)+alpha(icoef,lx)*g1k > coef(icoef,4)=coef(icoef,4)+alpha(icoef,lx)*g2k > ENDDO > This part, which is in the default part of the switch statement should only be executed in rare cases. I doubt it matters much in the overall timings. Also, this loop has very short trips (i.e. coef_max should, for the provided input, be at most 5). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31021