[Bug target/102404] Loop vectorized with 32 byte vectors actually uses 16 byte vectors

freddie at witherden dot org via Gcc-bugs Mon, 20 Sep 2021 05:17:01 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102404


--- Comment #3 from Freddie Witherden <freddie at witherden dot org> ---
(In reply to Richard Biener from comment #2)
> 32 bytes are 256 bits (ymm), 64 bytes are 512 bits (zmm).  GCC does not
> consider zmm vectorization because
> 
> t.c:25:37: missed:  loop does not have enough iterations to support
> vectorization.
> 
> because
> 
> t.c:25:37: note:  vectorization_factor = 16, niters = 8
> 
> the memory accesses cannot be related so we fail to SLP this.
> 
> Does clang use vpgathers/scatters on %zmm here?

Apologises for the typo.  However, I would still expect the loop to be
vectorized with %zmm as it is operating (fundamentally) on double precision
numbers (8 bytes) with a trip count of 8.

Clang assembly is attached, showing the expected structure.  Gathers and
scatters are used on %zmm here.

Could GCC be thrown off by the fact that the table indices are 4 byte integers?

[Bug target/102404] Loop vectorized with 32 byte vectors actually uses 16 byte vectors

Reply via email to