https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80689
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- Note that using unaligned 128bit moves might involve an even larger STLF penalty than if the loads/stores were aligned due to the fact they might cross a cache-line boundary and how store queues usually are laid out on cache-line arrangement.