https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70970
Bug ID: 70970 Summary: Misaligned SSE with auto-vectorization Product: gcc Version: 5.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rcc.dark at gmail dot com Target Milestone: --- Created attachment 38424 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38424&action=edit the code Sorry if this has been reported before. I have tested it under: gcc version 5.3.1 20151207 (Red Hat 5.3.1-2) (GCC) gcc version 5.2.1 20151028 (Debian 5.2.1-23) gcc version 6.1.0 (GCC) <----- Windows gcc version 5.2.0 (GCC) <----- Windows, MinGW-W64 The following code crashes with -std=c++14 -O3: #include <cstdint> #include <malloc.h> template<typename RI> __attribute__ ((noinline)) void symmetric_difference(RI ai, RI af, RI bi) { while (ai != af) { *ai++ ^= *bi++; } } int main( ) { auto p1 = reinterpret_cast<char*>(memalign(4096, 32)); auto p2 = reinterpret_cast<char*>(memalign(4096, 32)); // _aligned_malloc under Windows auto ai = reinterpret_cast<std::uint64_t*>(p1 + 1); auto bi = reinterpret_cast<std::uint64_t*>(p2 + 1); symmetric_difference(ai, ai + 64, bi); } It stops crashing with -O2 or if I remove the + 1 to the pointers; GDB tells me that the problem lies within: vmovdqa YMMWORD PTR [rbx+rcx*1],ymm0 The register rbx is not aligned and rcx = 0. It seems that it is reading with vmovdqu but storing with vmovdqa.