https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102404
--- Comment #3 from Freddie Witherden <freddie at witherden dot org> --- (In reply to Richard Biener from comment #2) > 32 bytes are 256 bits (ymm), 64 bytes are 512 bits (zmm). GCC does not > consider zmm vectorization because > > t.c:25:37: missed: loop does not have enough iterations to support > vectorization. > > because > > t.c:25:37: note: vectorization_factor = 16, niters = 8 > > the memory accesses cannot be related so we fail to SLP this. > > Does clang use vpgathers/scatters on %zmm here? Apologises for the typo. However, I would still expect the loop to be vectorized with %zmm as it is operating (fundamentally) on double precision numbers (8 bytes) with a trip count of 8. Clang assembly is attached, showing the expected structure. Gathers and scatters are used on %zmm here. Could GCC be thrown off by the fact that the table indices are 4 byte integers?