https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91043
--- Comment #17 from Uroš Bizjak <ubizjak at gmail dot com> --- The asm dump claims that the access is aligned to 32bytes: #(insn 14 31 9 2 (set (mem:V4DI (plus:DI (reg/f:DI 3 bx [orig:90 this ] [90]) # (const_int 64 [0x40])) [6 MEM[(long unsigned int *)this_6(D) + 64B]+0 S32 A256]) # (reg:V4DI 21 xmm0 [92])) "../../src/stateful_rx_core.cpp":254 1228 {movv4di_internal} # (nil)) vmovdqa %ymm0, 64(%rbx) # 14 movv4di_internal/4 [length = 5] which gets expanded from: ;; MEM[(long unsigned int *)this_6(D) + 64B] = { 0, 0, 0, 0 }; (insn 13 12 14 (set (reg:V4DI 92) (const_vector:V4DI [ (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) ])) "../../src/stateful_rx_core.cpp":254 -1 (nil)) (insn 14 13 0 (set (mem:V4DI (plus:DI (reg/f:DI 90 [ this ]) (const_int 64 [0x40])) [6 MEM[(long unsigned int *)this_6(D) + 64B]+0 S32 A256]) (reg:V4DI 92)) "../../src/stateful_rx_core.cpp":254 -1 (nil)) So, not a target issue.