https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85833
Bug ID: 85833 Summary: [AVX512] use mask registers instructions instead of scalar code Product: gcc Version: 7.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wojciech_mula at poczta dot onet.pl Target Milestone: --- There is a simple function, which checks if there is any non-zero element in a vector: ---ktest.c--- #include <immintrin.h> int anynonzero_epi32(__m512i x) { const __m512i zero = _mm512_setzero_si512(); const __mmask16 mask = _mm512_cmpneq_epi32_mask(x, zero); return mask != 0; } ---eof--- $ gcc --version gcc (Debian 7.3.0-16) 7.3.0 $ gcc -O2 -S -mavx512f ktest.c && cat ktest.s anynonzero_epi32: vpxord %zmm1, %zmm1, %zmm1 vpcmpd $4, %zmm1, %zmm0, %k1 kmovw %k1, %eax # <<< HERE testw %ax, %ax # setne %al movzbl %al, %eax vzeroupper ret The problem is that GCC copies content of the mask register k1 into GPR (using KMOV instruction), and then perform test. AVX512F has got instruction KTEST kx, ky which sets ZF and CF: ZF = (kx AND ky) == 0 CF = (kx AND NOT ky) == 0 In this case we might use KTEST k1, k1 to set ZF when k1 == 0. The procedure might be then compiled as: anynonzero_epi32: vpxord %zmm1, %zmm1, %zmm1 vpcmpd $4, %zmm1, %zmm0, %k1 xor %eax, %eax # ktestw %k1, %k1 # setne %al # vzeroupper ret