http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56083



Uros Bizjak <ubizjak at gmail dot com> changed:



           What    |Removed                     |Added

----------------------------------------------------------------------------

             Status|UNCONFIRMED                 |RESOLVED

         Resolution|                            |INVALID



--- Comment #1 from Uros Bizjak <ubizjak at gmail dot com> 2013-01-23 16:05:40 
UTC ---

(In reply to comment #0)

> Unnecessarily complex machine code is generated on x86-64. Perhaps there is a

> reason for this but to me it seems like the compiler is failing to optimize

> properly. Asm code labels changed and comments added, other than that they are

> are produced by the respective compilers for this C code:



This is tuning decision, use -march= for targets that benefit from unaligned

loads and stores:



  /* X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL */

  m_COREI7 | m_AMDFAM10 | m_BDVER | m_BTVER,



  /* X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL */

  m_COREI7 | m_BDVER,



-O3 -march=corei7 produces:



        movups  (%rdi), %xmm0

        xorps   .LC0(%rip), %xmm0

        movups  %xmm0, (%rdi)



Which is the same as your hand optimized code.

Reply via email to