https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66115
--- Comment #7 from carloscastro10 at hotmail dot com --- It cannot be a requirement. If it was, functions like __m128i _mm_loadu_si128 (__m128i const* mem_addr), which have always relied on mem_addr not being necessarily aligned, would not work. What has changed is that, before AVX, SSE arithmetic functions that took a pointer to __m128i as a parameter did require that pointer to be aligned to a 16-byte boundary. After AVX, SSE arithmetic functions encoded with a VEX prefix (for example vpavgb as opposed to pavgb) do not require its operands to be aligned. In other words, if I compile this line: sse0 = _mm_avg_epu8(sse0,*(__m128i*)(a)); with -msse4.1, then "a" needs to be aligned on a 16-byte boundary If I compile it with -mavx, then "a" does not need to be aligned anymore. The problem only occurs with -O0, because it saves the operands to registers before running the actual instruction. -O3 generates this instruction: vpavgb 4(%rax), %xmm0, %xmm0 That is a VEX-encoded instruction, so 4(%rax) does not need to be aligned. That code runs fine and never crashes (it is a very simplified version of code I have reliably running on a large codebase). -O0 generates these instructions: addq $4, %rax vmovdqa (%rax), %xmm0 ... vpavgb %xmm0, %xmm1, %xmm0 In this case, vmovdqa ALWAYS requires the memory location given as parameter to be aligned to a 16-byte boundary. The use of vmovdqa might be based on an incorrect assumption, dating back to the days before AVX, that the memory location passed to the _mm_avg_epu8 is aligned. When -mavx is selected the compiler cannot make that assumption and should use vmovdqu instead.