------- Comment #10 from hjl dot tools at gmail dot com 2010-06-28 19:17 ------- Here is a small testcase:
[...@gnu-6 44551]$ cat c.s .file "c.c" .text .p2align 4,,15 .globl foo .type foo, @function foo: .LFB798: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 vinsertf128 $0x1, %xmm1, %ymm0, %ymm0 movq %rsp, %rbp .cfi_offset 6, -16 .cfi_def_cfa_register 6 vextractf128 $0x1, %ymm0, %xmm0 leave .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE798: .size foo, .-foo .ident "GCC: (GNU) 4.6.0 20100625 (experimental)" .section .note.GNU-stack,"",@progbits [...@gnu-6 44551]$ The optimize code is vmovaps %xmm1, %xmm0 ret -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44551