------- Comment #27 from hjl at lucon dot org 2006-06-29 02:32 ------- Created an attachment (id=11777) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11777&action=view) An integer loop
I changed the loop from double to long long. The 64bit code generated by gcc 4.0 is 10% slower than gcc 3.4 on Nocona: /usr/gcc-3.4/bin/gcc -m32 -fomit-frame-pointer -O -c mmbench.c /usr/gcc-3.4/bin/gcc -m32 -fomit-frame-pointer -O -c gemm_atlas.c /usr/gcc-3.4/bin/gcc -m32 -fomit-frame-pointer -O -o xmm_gcc mmbench.o gemm_atlas.o rm -f *.o /usr/gcc-4.0/bin/gcc -m32 -fomit-frame-pointer -O -c mmbench.c /usr/gcc-4.0/bin/gcc -m32 -fomit-frame-pointer -O -c gemm_atlas.c /usr/gcc-4.0/bin/gcc -m32 -fomit-frame-pointer -O -o xmm_gc4 mmbench.o gemm_atlas.o rm -f *.o echo "GCC 3.x performance:" GCC 3.x performance: ./xmm_gcc ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== atlasmm 60 250 0.381 283.51 echo "GCC 4.x performance:" GCC 4.x performance: ./xmm_gc4 ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== atlasmm 60 250 0.389 277.68 gnu-16:pts/2[5]> make ~/bugs/gcc/27827/loop /usr/gcc-3.4/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -c mmbench.c /usr/gcc-3.4/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -c gemm_atlas.c /usr/gcc-3.4/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -o xmm_gcc mmbench.o gemm_atlas.o rm -f *.o /usr/gcc-4.0/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -c mmbench.c /usr/gcc-4.0/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -c gemm_atlas.c /usr/gcc-4.0/bin/gcc -DREPS=1000 -fomit-frame-pointer -O -o xmm_gc4 mmbench.o gemm_atlas.o rm -f *.o echo "GCC 3.x performance:" GCC 3.x performance: ./xmm_gcc ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== atlasmm 60 1000 0.172 2512.01 echo "GCC 4.x performance:" GCC 4.x performance: ./xmm_gc4 ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== atlasmm 60 1000 0.193 2238.68 So the problem may be also loop related. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827