Compiling this testcase with '-O2 -msse' an unoptimal code is produced. 'val1' is merged into vector at compile time, but it is still loaded onto stack. Gcc does not detect that 'val1' value on stack is sitting there unused.
#include <xmmintrin.h> #include <stdio.h> int main(void) { float val1 = 1.3f; float result[4]; __m128 A; A = _mm_load1_ps(&val1); _mm_storeu_ps(result, A); printf("%f %f %f %f\n", result[0], result[1], result[2], result[3]); return 0; } This code is produced: ... .LC2: <- merged vector .long 1067869798 .long 0 .long 0 .long 0 .text .p2align 4,,15 .globl main .type main, @function main: pushl %ebp movl %esp, %ebp subl $72, %esp movss .LC2, %xmm0 <- vector is loaded into %xmm0 andl $-16, %esp subl $16, %esp movl $0x3fa66666, -4(%ebp) <- 'val1' is put on stack here shufps $0, %xmm0, %xmm0 movl $.LC1, (%esp) movups %xmm0, -20(%ebp) flds -8(%ebp) fstpl 28(%esp) flds -12(%ebp) fstpl 20(%esp) flds -16(%ebp) fstpl 12(%esp) flds -20(%ebp) fstpl 4(%esp) call printf xorl %eax, %eax leave ret Even worser situation arises with: int main(void) { float val1[4] = {1.3f, 1.4f, 1.5f, 1.6f}; float result[4]; __m128 A; A = _mm_loadu_ps(val1); _mm_storeu_ps(result, A); printf("%f %f %f %f\n", result[0], result[1], result[2], result[3]); return 0; Following asm code is produced: main: pushl %ebp movl %esp, %ebp subl $72, %esp andl $-16, %esp movl $0x3fa66666, -16(%ebp) subl $16, %esp movl $0x3fb33333, -12(%ebp) movl $0x3fc00000, -8(%ebp) movl $0x3fcccccd, -4(%ebp) movups -16(%ebp), %xmm0 movups %xmm0, -32(%ebp) flds -20(%ebp) fstpl 28(%esp) flds -24(%ebp) fstpl 20(%esp) flds -28(%ebp) fstpl 12(%esp) flds -32(%ebp) fstpl 4(%esp) movl $.LC4, (%esp) call printf xorl %eax, %eax leave ret The constant values are not merged into vector at compile time, the vector is built on the stack and then loaded into %xmm register. Value on stack is again left unused after vector initialization. Uros. -- Summary: SSE constant vector initialization produces dead constant values on stack Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: uros at gcc dot gnu dot org CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562