http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56083
Uros Bizjak <ubizjak at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution| |INVALID --- Comment #1 from Uros Bizjak <ubizjak at gmail dot com> 2013-01-23 16:05:40 UTC --- (In reply to comment #0) > Unnecessarily complex machine code is generated on x86-64. Perhaps there is a > reason for this but to me it seems like the compiler is failing to optimize > properly. Asm code labels changed and comments added, other than that they are > are produced by the respective compilers for this C code: This is tuning decision, use -march= for targets that benefit from unaligned loads and stores: /* X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL */ m_COREI7 | m_AMDFAM10 | m_BDVER | m_BTVER, /* X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL */ m_COREI7 | m_BDVER, -O3 -march=corei7 produces: movups (%rdi), %xmm0 xorps .LC0(%rip), %xmm0 movups %xmm0, (%rdi) Which is the same as your hand optimized code.