https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119108

--- Comment #8 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Ok, so having looked at this I'm not sure the compiler is at fault here.

Similar to the SVN case the snappy code is misaligning the loads intentionally
and  loading 64-bits at a time from the 8-bit pointer:

https://github.com/google/snappy/blob/32ded457c0b1fe78ceb8397632c416568d6714a0/snappy.cc#L1002
shifts the starting point 4 bytes in and
https://github.com/google/snappy/blob/32ded457c0b1fe78ceb8397632c416568d6714a0/snappy-internal.h#L337
uses `UNALIGNED_LOAD64`.

The vectorizer has to increase the VF to vectorize this loop, and it also
vectorizes based on DImodes.

Additionally to vectorize the vectorizer has to insert peeling + alias +
elements checks in the loop.  but at 32 bytes loop requirements.

This means the vector code requires a very high alignment and number of entries
to enter, and it never does.  The vector code is thus cold.

So I think this is a case where the compiler can't do anything. (I also think
that the C code uses UB similar to SVN, they misalign the byte array to 4-bytes
but load 8-bytes at a time. They get lucky that the vector code is never
entered).

The could would be beneficial if they:

1. added restrict to the functions, as eg in `FindMatchLengthPlain` values
manually vectorized anyway so aliasing must not be a problem
2. they have a simple scalar loop variant that's left up to the vectorizer to
vectorize.  This would actually give them faster code and allow e.g. SVE
codegen.

So I'm tempted to say this is a WONTFIX as there's nothing the compiler can do
here.

Reply via email to