https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111793
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- non-openmp testcase for the AVX512 mask inefficiency: unsigned foo (unsigned *a, unsigned short mask) { unsigned sum = 0; for (int i = 0; i < 16; ++i) if ((mask >> i) & 1) sum += a[i]; return sum; } I think the AVX2 one is settled as being as good as it can get.