------- Comment #3 from burnus at gcc dot gnu dot org 2007-03-02 09:38 ------- On my "AMD Athlon(tm) 64 X2 Dual Core Processor 4800+", gfortran is in x86_64 mode only 13% slower: gfortran: Kernel time 5.872366, real 0m33.121s; user 0m32.898s; sys 0m0.088s. Ifort: Kernel time 5.244328, real 0m28.893s, user 0m28.758s, sys 0m0.076s. Options: "ifort -xP -O3 -xW -free" and "gfortran -O3 -march=native -ffast-math -ffree-form -ftree-vectorize -funroll-loops".
For grid_fast.F, one difference is which loops are vectorized; ifort vectorizes the loops in line 44, 469, 483 and 496, gfortran only vectorizes the loops in line 496 and 469; for the other ones: grid_fast.F:44: note: not vectorized: complicated access pattern. DO lz=1,lz_max(lxy) lxyz=lxyz+1 pyx(1,lxy)=pyx(1,lxy)+pzyx(lxyz)*polz(lxyz,kg) pyx(2,lxy)=pyx(2,lxy)+pzyx(lxyz)*polz(lxyz,kg2) ENDDO grid_fast.F:483: note: not vectorized: can't determine dependence between (*coef_447)[D.1967_2320] and (*coef_447)[D.1967_2320] DO icoef=1,coef_max coef(icoef,1)=coef(icoef,1)+alpha(icoef,lx)*g1 coef(icoef,2)=coef(icoef,2)+alpha(icoef,lx)*g2 coef(icoef,3)=coef(icoef,3)+alpha(icoef,lx)*g1k coef(icoef,4)=coef(icoef,4)+alpha(icoef,lx)*g2k ENDDO -- burnus at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |burnus at gcc dot gnu dot | |org Keywords| |missed-optimization http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31021