https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63756
Uroš Bizjak <ubizjak at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #4 from Uroš Bizjak <ubizjak at gmail dot com> --- _mm_cvtepi16_epi32 accepts __m128i operand, and when this operand is loaded from memory, it has to be located in 16byte location with 16byte natural alignment. The fact that "pmovsxd (%rax), %xmm0" is emitted is an implementation detail of the intrinsic, but this fact does not relax the requirement for the pointer. You are casting the pointer to __m128i, so you have to guarantee that the pointer is suitably aligned and points to the correct memory location. Please use assembly primitive, if you want to emit pmovsxwd that loads from 64bit memory location. There is no intrinsic with this functionality available. Oh, and nobody will care about long (but otherwise correct) assembly when optimization is switched off.