GCC 4.x tree optimization decides to put int values into long long int temporaries. When RTL expansion comes around, the expander sees only a DImode multiply and so generates three SImode multiplies to deal with the problem.
GCC 3.x sees that the source values are SImode and uses mulsidi3 to generate 32x32->64 multiplies, which are much more efficient. It also picks up the accumulation. (using -O3 for all compilations) GCC 3.4 has an 84-byte stack frame, and a body of 372 instructions. GCC 4.1 has a 1416-byte stack frame, and a body of 1668 instructions. GCC 4.2 has a 1320-byte stack frame, and a body of 1565 instructions. -- Summary: 4.1, 4.2 (possibly 4.0?) not using mulsidi3 Product: gcc Version: 4.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: eplondke at gmail dot com GCC host triplet: x86_64-suse-linux GCC target triplet: arm-unkown-elf http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29274