------- Comment #3 from rask at gcc dot gnu dot org 2007-10-25 18:58 ------- I see a substantial improvent when testing on the compile farm hardware:
processor : 3 vendor_id : AuthenticAMD cpu family : 15 model : 65 model name : Dual-Core AMD Opteron(tm) Processor 2212 stepping : 3 cpu MHz : 2000.240 cache size : 1024 KB ... $ gcc --version | head -n 1 gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21) $ gcc -O3 ~/pr30801.c && time ./a.out 064069fbc13963b920219c3e939225e38e38e38e3956d81c71c71c71c0ba0f00 real 0m0.555s user 0m0.552s sys 0m0.004s $ (cd ~/build/gcc-x86_64-unknown-linux-gnu/gcc && ./xgcc --version | head -n 1) xgcc (GCC) 4.3.0 20071022 (experimental) $ (cd ~/build/gcc-x86_64-unknown-linux-gnu/gcc && ./xgcc -B./ -O3 ~/pr30801.c && time ./a.out) 064069fbc13963b920219c3e939225e38e38e38e3956d81c71c71c71c0ba0f00 real 0m0.455s user 0m0.452s sys 0m0.004s Note that your -march=pentium4 option is rejected without -m32: $ gcc -march=pentium4 -O3 ~/pr30801.c && time ./a.out /home/rask/pr30801.c:1: error: CPU you selected does not support x86-64 instruction set /home/rask/pr30801.c:1: error: CPU you selected does not support x86-64 instruction set $ gcc -O3 ~/pr30801.c -m32 -march=pentium4 && time ./a.out 064069fbc13963b920219c3e939225e38e38e38e3956d81c71c71c71c0ba0f00 real 0m2.234s user 0m2.232s sys 0m0.004s $ (cd ~/build/gcc-x86_64-unknown-linux-gnu/gcc && ./xgcc -B./ -O3 ~/pr30801.c -m32 -march=pentium4 && time ./a.out) 064069fbc13963b920219c3e939225e38e38e38e3956d81c71c71c71c0ba0f00 real 0m1.488s user 0m1.484s sys 0m0.004s So GCC 4.3 is 22 % faster with just the default -m64 + no -march and an impressive 50 % faster with -m32 -march=pentium4. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30801