[Bug target/54412] minimal 32-byte stack alignment with -mavx on 64-bit Windows
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412 Dmitry Kazakov changed: What|Removed |Added CC||dimula73 at gmail dot com --- Comment #25 from Dmitry Kazakov --- Hi, all! I would like to add one more test file, related to the problem. If GCC tries to call a function, that accepts a __m256 register as a parameter, it unloads this parameter into the stack using an **aligned** move (vmovaps), but the alignment guarantee on Windows is only 16-byte. It means that the application will crash because of unaligned memory access. Affected versions: GCC 7.3.0 (MinGW64), GCC 8.1.0 (MinGW64) Here is the testing source (see also in an attachment): #include struct X { alignas(32) __m256 d; }; void g1(X); void g2(const X&); void g3(const void *); void f(float *ptr) { X x = {_mm256_load_ps(ptr)}; g1(x); // BUG: passes via unaligned (whatever rsp alignment is) stack g2(x); // OK: passes via aligned stack location g3(&x); // OK: passes via aligned stack location } Compiled result (-O2 -march=skylake): _Z1fPf: .LFB5135: pushq %rbx .seh_pushreg%rbx addq$-128, %rsp .seh_stackalloc 128 .seh_endprologue vmovaps (%rcx), %ymm0 leaq95(%rsp), %rbx leaq32(%rsp), %rcx andq$-32, %rbx vmovaps %ymm0, (%rbx)# %rbx is properly aligned vmovaps %ymm0, 32(%rsp) # %rsp may be unaligned vzeroupper call_Z2g11X movq%rbx, %rcx call_Z2g2RK1X movq%rbx, %rcx call_Z2g3PKv nop subq$-128, %rsp popq%rbx ret Related bug in Vc library: https://github.com/VcDevel/Vc/issues/241 Related bug in Krita: https://bugs.kde.org/show_bug.cgi?id=406209
[Bug target/54412] minimal 32-byte stack alignment with -mavx on 64-bit Windows
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412 --- Comment #26 from Dmitry Kazakov --- Created attachment 46133 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46133&action=edit Test source for unaligned pass-by-value crash Test file for the comment above
[Bug target/54412] minimal 32-byte stack alignment with -mavx on 64-bit Windows
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412 --- Comment #27 from Dmitry Kazakov --- As a workaround, one can either use __attribute__((always_inline)) for *all* the functions accepting __m256 or pass *all* arguments by const-ref. Const-ref arguments are passed correctly.
[Bug target/54412] minimal 32-byte stack alignment with -mavx on 64-bit Windows
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412 Dmitry Kazakov changed: What|Removed |Added CC||dimula73 at gmail dot com --- Comment #31 from Dmitry Kazakov --- Hi, all! Just wanted to note that the bug is still present in GCC 10.3.0 on Windows (from MSYS-MinGW64 packages). > gcc (Rev5, Built by MSYS2 project) 10.3.0
[Bug target/54412] minimal 32-byte stack alignment with -mavx on 64-bit Windows
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412 --- Comment #42 from Dmitry Kazakov --- Hi, Avraham! > Does it remain true that the only option to get around this bug without > killing all AVX2 is to pass "-Wa,-muse-unaligned-vector-move" when compiling > using GCC on Windows 64? Thank you I'm not sure about your particular issue, but in our case we used to manage to workaround this issue by passing AVX2-related structures by reference (or const-reference, when possible).