https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119108
--- Comment #8 from Tamar Christina <tnfchris at gcc dot gnu.org> --- Ok, so having looked at this I'm not sure the compiler is at fault here. Similar to the SVN case the snappy code is misaligning the loads intentionally and loading 64-bits at a time from the 8-bit pointer: https://github.com/google/snappy/blob/32ded457c0b1fe78ceb8397632c416568d6714a0/snappy.cc#L1002 shifts the starting point 4 bytes in and https://github.com/google/snappy/blob/32ded457c0b1fe78ceb8397632c416568d6714a0/snappy-internal.h#L337 uses `UNALIGNED_LOAD64`. The vectorizer has to increase the VF to vectorize this loop, and it also vectorizes based on DImodes. Additionally to vectorize the vectorizer has to insert peeling + alias + elements checks in the loop. but at 32 bytes loop requirements. This means the vector code requires a very high alignment and number of entries to enter, and it never does. The vector code is thus cold. So I think this is a case where the compiler can't do anything. (I also think that the C code uses UB similar to SVN, they misalign the byte array to 4-bytes but load 8-bytes at a time. They get lucky that the vector code is never entered). The could would be beneficial if they: 1. added restrict to the functions, as eg in `FindMatchLengthPlain` values manually vectorized anyway so aliasing must not be a problem 2. they have a simple scalar loop variant that's left up to the vectorizer to vectorize. This would actually give them faster code and allow e.g. SVE codegen. So I'm tempted to say this is a WONTFIX as there's nothing the compiler can do here.