The following snippet runs about 25% faster on my 1333MHz 7447A with -Os than -O2, even if compiled with -mcpu=G4:
const unsigned c=0x1010101; unsigned z=0; for(int i=c; i>=2; i*=c) { z+=i&1; } With -mcpu=G4 -O2 it gives: lis 0,0x101 mr 31,3 ori 10,0,257 .L4: slwi 0,10,8 rlwinm 11,10,0,31,31 add 0,0,10 add 28,28,11 slwi 9,0,16 add 10,0,9 cmplwi 7,10,1 bgt+ 7,.L4 With -mcpu=G4 -Os it gives: lis 0,0x101 ori 9,0,257 mr 30,3 mr 11,9 .L4: rlwinm 0,9,0,31,31 mullw 9,9,11 add 31,31,0 cmplwi 7,9,1 bgt+ 7,.L4 The latter one is about 25% faster, meaning that the G4's multiplier is much faster than GCC assumes. With other constants the difference tends to be smaller, but I haven't encountered a situation where the MUL-version is slower (when Os and O2 build different code here). d...@bronko:~/c-Test$ g++ -v Es werden eingebaute Spezifikationen verwendet. Ziel: powerpc-linux-gnu Konfiguriert mit: ../src/configure -v --with-pkgversion='Debian 4.3.2-1.1' --with-bugurl=file:///usr/share/doc/gcc-4.3/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.3 --program-suffix=-4.3 --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-mpfr --disable-softfloat --enable-secureplt --enable-targets=powerpc-linux,powerpc64-linux --with-cpu=default32 --with-long-double-128 --enable-checking=release --build=powerpc-linux-gnu --host=powerpc-linux-gnu --target=powerpc-linux-gnu Thread-Modell: posix gcc-Version 4.3.2 (Debian 4.3.2-1.1) -- Summary: MULLW on often faster than SLWI ADD SLWI ADD.. Product: gcc Version: 4.3.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dom at wtal dot de GCC build triplet: PowerPC-Debian-Lenny GCC host triplet: PowerPC-Debian-Lenny GCC target triplet: PowerPC-Debian-Lenny http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38939