Consider the following C code: #include <inttypes.h>
void dct2x2dc_dconly( int16_t d[2][2] ) { int d0 = d[0][0] + d[0][1]; int d1 = d[1][0] + d[1][1]; d[0][0] = d0 + d1; d[0][1] = d0 - d1; } The following is generated with arm-none-linux-gnueabi-gcc-4.4.0 -O3 -mcpu=cortex-a8 -S dct2x2dc_dconly: ldrsh ip, [r0, #2] ldrsh r3, [r0, #0] ldrsh r1, [r0, #6] ldrsh r2, [r0, #4] add r3, ip, r3 add r2, r1, r2 uxth r3, r3 uxth r2, r2 rsb r1, r2, r3 add r3, r2, r3 strh r1, [r0, #2] @ movhi strh r3, [r0, #0] @ movhi bx lr (with pre-armv6 targets the two uxth are replaced by asl #16, lsr #16 pairs.) The following is generated with powerpc-unknown-linux-gnu-gcc-4.4.0 -O3 -mcpu=G4 -S dct2x2dc_dconly: lha 10,2(3) lha 0,0(3) lha 11,6(3) lha 9,4(3) add 0,10,0 rlwinm 0,0,0,0xffff add 9,11,9 rlwinm 9,9,0,0xffff subf 11,9,0 add 0,9,0 sth 11,2(3) sth 0,0(3) blr The two uxth in the ARM version, and the two rlwinm in the PPC version are completely unnecessary, as letting strh/sth truncate will give equivalent results. x86 does not exhibit this behaviour, and removing either d0 + d1 or d0 - d1 will not cause d0 and d1 be truncated to to 16 bits on both ARM and PPC. powerpc-unknown-linux-gnu-gcc-4.4.0 -v Using built-in specs. Target: powerpc-unknown-linux-gnu Configured with: /var/tmp/portage/sys-devel/gcc-4.4.0/work/gcc-4.4.0/configure --prefix=/usr --bindir=/usr/powerpc-unknown-linux-gnu/gcc-bin/4.4.0 --includedir=/usr/lib/gcc/powerpc-unknown-linux-gnu/4.4.0/include --datadir=/usr/share/gcc-data/powerpc-unknown-linux-gnu/4.4.0 --mandir=/usr/share/gcc-data/powerpc-unknown-linux-gnu/4.4.0/man --infodir=/usr/share/gcc-data/powerpc-unknown-linux-gnu/4.4.0/info --with-gxx-include-dir=/usr/lib/gcc/powerpc-unknown-linux-gnu/4.4.0/include/g++-v4 --host=powerpc-unknown-linux-gnu --build=powerpc-unknown-linux-gnu --enable-altivec --disable-fixed-point --without-ppl --without-cloog --disable-nls --with-system-zlib --disable-checking --disable-werror --enable-secureplt --disable-multilib --disable-libmudflap --disable-libssp --enable-libgomp --enable-cld --disable-libgcj --enable-languages=c,c++,fortran --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --with-bugurl=http://bugs.gentoo.org/ --with-pkgversion='Gentoo 4.4.0 p1.1' Thread model: posix gcc version 4.4.0 (Gentoo 4.4.0 p1.1) arm-none-linux-gnueabi-gcc-4.4.0 -v Using built-in specs. Target: arm-none-linux-gnueabi Configured with: ../gcc-4.4.0/configure --target=arm-none-linux-gnueabi --prefix=/usr/local/arm --enable-threads --with-sysroot=/usr/local/arm/arm-none-linux-gnueabi/libc Thread model: posix gcc version 4.4.0 (GCC) -- Summary: ARM and PPC truncate intermediate operations unnecessarily Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: lessen42+gcc at gmail dot com GCC host triplet: i386-apple-darwin GCC target triplet: arm-none-linux-gnueabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40893