https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66115

--- Comment #7 from carloscastro10 at hotmail dot com ---
It cannot be a requirement. If it was, functions like __m128i _mm_loadu_si128
(__m128i const* mem_addr), which have always relied on mem_addr not being
necessarily aligned, would not work. What has changed is that, before AVX, SSE
arithmetic functions that took a pointer to __m128i as a parameter did require
that pointer to be aligned to a 16-byte boundary. After AVX, SSE arithmetic
functions encoded with a VEX prefix (for example vpavgb as opposed to pavgb) do
not require its operands to be aligned. 

In other words, if I compile this line:

sse0 = _mm_avg_epu8(sse0,*(__m128i*)(a));

with -msse4.1, then "a" needs to be aligned on a 16-byte boundary

If I compile it with -mavx, then "a" does not need to be aligned anymore. 

The problem only occurs with -O0, because it saves the operands to registers
before running the actual instruction. 

-O3 generates this instruction: 
vpavgb  4(%rax), %xmm0, %xmm0

That is a VEX-encoded instruction, so 4(%rax) does not need to be aligned. That
code runs fine and never crashes (it is a very simplified version of code I
have reliably running on a large codebase). 

-O0 generates these instructions:
addq    $4, %rax
vmovdqa (%rax), %xmm0
...
vpavgb  %xmm0, %xmm1, %xmm0

In this case, vmovdqa ALWAYS requires the memory location given as parameter to
be aligned to a 16-byte boundary. The use of vmovdqa might be based on an
incorrect assumption, dating back to the days before AVX, that the memory
location passed to the _mm_avg_epu8 is aligned. When -mavx is selected the
compiler cannot make that assumption and should use vmovdqu instead.

Reply via email to