https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80561
--- Comment #4 from Marc Glisse <glisse at gcc dot gnu.org> --- Cool, that matches pretty much exactly the analysis I had posted on stackoverflow ;-) A separate issue from whether we can somehow propagate the alignment information is what we do without the alignment information (remove the attribute to be sure). Gcc generates a rather large code, with scalar and vector loops, to try and reach an aligned position for one of the buffers (the other one still requires potentially unaligned access) and perform at most 2 vector iterations. On the other hand, clang+llvm don't care about alignment and generate unaligned vector operations, totally unrolled (that's 2 vector iterations since there were 8 scalar iterations initially), for a grand total of 6 insns (with AVX). I have a hard time believing that gcc's complicated code is ever faster than clang's, whether the arrays are aligned or not. We can discuss that in a separate PR if this one should center on alignment.