https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65369
--- Comment #30 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Alan Modra from comment #28) > Created attachment 35024 [details] > modified testcase without bswap optimization > > This modified testcase avoids triggering the bswap optimization but still > shows a failure at -O3. So definitely not a problem caused by Thomas' patch. > > -O3 -fno-tree-slp-vectorize is OK > > -O3 slp dump shows weird offset of +12 between vector loads rather than +16 > as is usual Well, that's the realign-load sequence. Load ptr & ~15 and then load (ptr + 12) & ~15. If ptr is already aligned both loads should load from the same location. SLP happens in md4_update only (but it's quite trivial).