http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49358
Summary: optimization regression in 4.6.1 from 4.5.4 Product: gcc Version: 4.6.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: v.hais...@sh.cvut.cz The code: --------8<---------- double f (double y) { return y*y*y*y; } --------8<---------- This compiles using version 4.5.4 20110602 (prerelease) (FreeBSD Ports Collection) to this: --------8<---------- _Z1fd: .LFB87: .cfi_startproc movapd %xmm0, %xmm1 # y, y mulsd %xmm1, %xmm0 # y, tmp63 mulsd %xmm1, %xmm0 # y, tmp63 mulsd %xmm1, %xmm0 # y, tmp63 ret --------8<---------- But gcc version 4.6.1 20110408 (prerelease) (GCC) produces what seems to be less optimal version: --------8<---------- _Z1fd: .LFB85: .cfi_startproc movapd %xmm0, %xmm1 # y, tmp64 mulsd %xmm0, %xmm1 # y, tmp64 mulsd %xmm0, %xmm1 # y, tmp64 mulsd %xmm0, %xmm1 # y, tmp64 movapd %xmm1, %xmm0 # tmp64, ret --------8<---------- GCC 4.5.4 seems to be smarter in register allocation avoiding the final movapd. In both cases I have used -O3 -fverbose-asm -save-temps parameters to g++.