------- Comment #4 from whaley at cs dot utsa dot edu 2007-06-27 17:00 ------- Andrew,
>PowerPC970FX is not a direct descendent of Power5 Sorry, completely misremembered this. Since Power4 didn't suffer as bad as Power5 (I think it lost maybe 10% rather than 50), maybe the 970 will also not die. >so I think the case is the register allocator is messing up (which is already >known) OK, can you point me to the bug report? Is there some way to confirm this is the problem, rather than the scheduling pass itself? >The other thing is what options are you using to invoke GCC with? My Makefile shows them. The gcc3-derived flags are: -mcpu=power5 -mtune=power5 -O3 -m64 for gcc4, I get most of my performance back if I add: -fno-schedule-insns -fno-rerun-loop-opt I include below example output and arch info on the machine I created the benchmark on (forgot to include it before, sorry). Thanks, Clint r78n04 noibm122/TEST> uname -a Linux r78n04 2.6.5-7.244-pseries64 #1 SMP Mon Dec 12 18:32:25 UTC 2005 ppc64 ppc64 ppc64 GNU/Linux r78n04 noibm122/TEST> /usr/bin/gcc -v Reading specs from /usr/lib/gcc-lib/powerpc-suse-linux/3.3.3/specs Configured with: ../configure --enable-threads=posix --prefix=/usr --with-local-prefix=/usr/local --infodir=/usr/share/info --mandir=/usr/share/man --enable-languages=c,c++,f77,objc,java,ada --disable-checking --libdir=/usr/lib --enable-libgcj --with-gxx-include-dir=/usr/include/g++ --with-slibdir=/lib --with-system-zlib --enable-shared --enable-__cxa_atexit --host=powerpc-suse-linux --build=powerpc-suse-linux --target=powerpc-suse-linux --enable-targets=powerpc64-suse-linux --enable-biarch Thread model: posix gcc version 3.3.3 (SuSE Linux) r78n04 noibm122/TEST> gcc -v Using built-in specs. Target: powerpc64-unknown-linux-gnu Configured with: ../configure --prefix=/home/whaley/local/linux --enable-languages=c --with-gmp=/u/noibm122/local/linux --with-mpfr-lib=/u/noibm122/local/linux/lib --with-mpfr-include=/u/noibm122/local/linux/include Thread model: posix gcc version 4.2.0 r78n04 TEST/MMBENCH_PPC> make all /usr/bin/gcc -DREPS=1000 -DWALL -mcpu=power5 -mtune=power5 -O3 -m64 -c mmbench.c /usr/bin/gcc -DREPS=1000 -DWALL -mcpu=power5 -mtune=power5 -O3 -m64 -c dgemm_atlas.c /usr/bin/gcc -DREPS=1000 -DWALL -mcpu=power5 -mtune=power5 -O3 -m64 -o xdmm_gcc3 mmbench.o dgemm_atlas.o rm -f *.o /u/noibm122/local/linux/home/whaley/local/linux/bin/gcc -DREPS=1000 -DWALL -mcpu=power5 -mtune=power5 -O3 -m64 -c mmbench.c /u/noibm122/local/linux/home/whaley/local/linux/bin/gcc -DREPS=1000 -DWALL -mcpu=power5 -mtune=power5 -O3 -m64 -c dgemm_atlas.c /u/noibm122/local/linux/home/whaley/local/linux/bin/gcc -DREPS=1000 -DWALL -mcpu=power5 -mtune=power5 -O3 -m64 -o xdmm_gcc4 mmbench.o dgemm_atlas.o rm -f *.o /u/noibm122/local/linux/home/whaley/local/linux/bin/gcc -DREPS=1000 -DWALL -mcpu=power5 -mtune=power5 -O3 -m64 -c mmbench.c /u/noibm122/local/linux/home/whaley/local/linux/bin/gcc -DREPS=1000 -DWALL -mcpu=power5 -mtune=power5 -O3 -m64 -fno-schedule-insns -fno-rerun-loop-opt -c \ dgemm_atlas.c /u/noibm122/local/linux/home/whaley/local/linux/bin/gcc -DREPS=1000 -DWALL -mcpu=power5 -mtune=power5 -O3 -m64 -o xdmm_gcc4_nosched mmbench.o dgemm_atlas.o rm -f *.o echo "GCC 3.x performance:" GCC 3.x performance: ./xdmm_gcc3 ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== atlasmm 40 1000 0.026 4998.24 echo "GCC 4.2 performance:" GCC 4.2 performance: ./xdmm_gcc4 ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== atlasmm 40 1000 0.034 3806.35 echo "GCC 4.2 w/o scheduling performance:" GCC 4.2 w/o scheduling performance: ./xdmm_gcc4_nosched ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== atlasmm 40 1000 0.025 5044.53 -- whaley at cs dot utsa dot edu changed: What |Removed |Added ---------------------------------------------------------------------------- Component|target |c http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32523