https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80561
--- Comment #6 from Marc Glisse <glisse at gcc dot gnu.org> --- (In reply to rguent...@suse.de from comment #5) > I'm sure a microbench would show that makes a difference. A micro-benchmark on skylake with -march=native (using just -mavx2 is worse for gcc without affecting clang) seems to indicate that the speed difference is within the noise level, consistently whether the data is aligned or not (the only case where the difference was obvious was when the buffer did not even have the alignment for a double, where clang won with a large margin, but that doesn't count).