------- Comment #4 from whaley at cs dot utsa dot edu 2007-06-27 17:00 -------
Andrew,
>PowerPC970FX is not a direct descendent of Power5
Sorry, completely misremembered this. Since Power4 didn't suffer as bad
as Power5 (I think it lost maybe 10% rather than 50), maybe the 970 will
also not die.
>so I think the case is the register allocator is messing up (which is already
>known)
OK, can you point me to the bug report? Is there some way to confirm this
is the problem, rather than the scheduling pass itself?
>The other thing is what options are you using to invoke GCC with?
My Makefile shows them. The gcc3-derived flags are:
-mcpu=power5 -mtune=power5 -O3 -m64
for gcc4, I get most of my performance back if I add:
-fno-schedule-insns -fno-rerun-loop-opt
I include below example output and arch info on the machine I created the
benchmark on (forgot to include it before, sorry).
Thanks,
Clint
r78n04 noibm122/TEST> uname -a
Linux r78n04 2.6.5-7.244-pseries64 #1 SMP Mon Dec 12 18:32:25 UTC 2005 ppc64
ppc64 ppc64 GNU/Linux
r78n04 noibm122/TEST> /usr/bin/gcc -v
Reading specs from /usr/lib/gcc-lib/powerpc-suse-linux/3.3.3/specs
Configured with: ../configure --enable-threads=posix --prefix=/usr
--with-local-prefix=/usr/local --infodir=/usr/share/info
--mandir=/usr/share/man --enable-languages=c,c++,f77,objc,java,ada
--disable-checking --libdir=/usr/lib --enable-libgcj
--with-gxx-include-dir=/usr/include/g++ --with-slibdir=/lib --with-system-zlib
--enable-shared --enable-__cxa_atexit --host=powerpc-suse-linux
--build=powerpc-suse-linux --target=powerpc-suse-linux
--enable-targets=powerpc64-suse-linux --enable-biarch
Thread model: posix
gcc version 3.3.3 (SuSE Linux)
r78n04 noibm122/TEST> gcc -v
Using built-in specs.
Target: powerpc64-unknown-linux-gnu
Configured with: ../configure --prefix=/home/whaley/local/linux
--enable-languages=c --with-gmp=/u/noibm122/local/linux
--with-mpfr-lib=/u/noibm122/local/linux/lib
--with-mpfr-include=/u/noibm122/local/linux/include
Thread model: posix
gcc version 4.2.0
r78n04 TEST/MMBENCH_PPC> make all
/usr/bin/gcc -DREPS=1000 -DWALL -mcpu=power5 -mtune=power5 -O3 -m64 -c
mmbench.c
/usr/bin/gcc -DREPS=1000 -DWALL -mcpu=power5 -mtune=power5 -O3 -m64 -c
dgemm_atlas.c
/usr/bin/gcc -DREPS=1000 -DWALL -mcpu=power5 -mtune=power5 -O3 -m64 -o
xdmm_gcc3 mmbench.o dgemm_atlas.o
rm -f *.o
/u/noibm122/local/linux/home/whaley/local/linux/bin/gcc -DREPS=1000 -DWALL
-mcpu=power5 -mtune=power5 -O3 -m64 -c mmbench.c
/u/noibm122/local/linux/home/whaley/local/linux/bin/gcc -DREPS=1000 -DWALL
-mcpu=power5 -mtune=power5 -O3 -m64 -c dgemm_atlas.c
/u/noibm122/local/linux/home/whaley/local/linux/bin/gcc -DREPS=1000 -DWALL
-mcpu=power5 -mtune=power5 -O3 -m64 -o xdmm_gcc4 mmbench.o dgemm_atlas.o
rm -f *.o
/u/noibm122/local/linux/home/whaley/local/linux/bin/gcc -DREPS=1000 -DWALL
-mcpu=power5 -mtune=power5 -O3 -m64 -c mmbench.c
/u/noibm122/local/linux/home/whaley/local/linux/bin/gcc -DREPS=1000 -DWALL
-mcpu=power5 -mtune=power5 -O3 -m64 -fno-schedule-insns -fno-rerun-loop-opt -c
\
dgemm_atlas.c
/u/noibm122/local/linux/home/whaley/local/linux/bin/gcc -DREPS=1000 -DWALL
-mcpu=power5 -mtune=power5 -O3 -m64 -o xdmm_gcc4_nosched mmbench.o
dgemm_atlas.o
rm -f *.o
echo "GCC 3.x performance:"
GCC 3.x performance:
./xdmm_gcc3
ALGORITHM NB REPS TIME MFLOPS
========= ===== ===== ========== ==========
atlasmm 40 1000 0.026 4998.24
echo "GCC 4.2 performance:"
GCC 4.2 performance:
./xdmm_gcc4
ALGORITHM NB REPS TIME MFLOPS
========= ===== ===== ========== ==========
atlasmm 40 1000 0.034 3806.35
echo "GCC 4.2 w/o scheduling performance:"
GCC 4.2 w/o scheduling performance:
./xdmm_gcc4_nosched
ALGORITHM NB REPS TIME MFLOPS
========= ===== ===== ========== ==========
atlasmm 40 1000 0.025 5044.53
--
whaley at cs dot utsa dot edu changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|target |c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32523