http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48037
Summary: Missed optimization: unnecessary register moves Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: schnet...@gmail.com Created attachment 23587 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23587 Source code I want to perform certain operations on an SSE double precision vector. I am using the intrinsics offered by emmintrin.h to decompose the vector into two scalars, perform the operation on both elements, and reconstruct the vector. As example operation I calculate the square root using scalar instructions. I am aware that there is a vector instruction for this; I am only using this as a placeholder to simplify the code. I use gcc 4.6.0: $ g++-mp-4.6 --version g++-mp-4.6 (GCC) 4.6.0 20110305 (experimental) on a MacBook Pro: $ uname -a Darwin erik-schnetters-macbook-pro.local 10.6.0 Darwin Kernel Version 10.6.0: Wed Nov 10 18:13:17 PST 2010; root:xnu-1504.9.26~3/RELEASE_I386 i386 with a 2.66 GHz Intel Core i7 processor and I compile with the options $ g++-mp-4.6 -S -O3 -march=native -ffast-math vecmath.cc I tried four different ways of extracting the scalars for the vector, and I find that gcc generates unnecessary register-register moves in almost every case.