https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85833

            Bug ID: 85833
           Summary: [AVX512] use mask registers instructions instead of
                    scalar code
           Product: gcc
           Version: 7.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: wojciech_mula at poczta dot onet.pl
  Target Milestone: ---

There is a simple function, which checks if there is any non-zero element
in a vector:

---ktest.c---
#include <immintrin.h>

int anynonzero_epi32(__m512i x) {
    const __m512i   zero = _mm512_setzero_si512();
    const __mmask16 mask = _mm512_cmpneq_epi32_mask(x, zero);
    return mask != 0;
}
---eof---

$ gcc --version
gcc (Debian 7.3.0-16) 7.3.0

$ gcc -O2 -S -mavx512f ktest.c && cat ktest.s

anynonzero_epi32:
    vpxord  %zmm1, %zmm1, %zmm1
    vpcmpd  $4, %zmm1, %zmm0, %k1
    kmovw   %k1, %eax               # <<< HERE
    testw   %ax, %ax                #
    setne   %al
    movzbl  %al, %eax
    vzeroupper
    ret

The problem is that GCC copies content of the mask register k1 into
GPR (using KMOV instruction), and then perform test. AVX512F has got
instruction KTEST kx, ky which sets ZF and CF:

    ZF = (kx AND ky) == 0
    CF = (kx AND NOT ky) == 0

In this case we might use KTEST k1, k1 to set ZF when k1 == 0.
The procedure might be then compiled as:

anynonzero_epi32:
    vpxord  %zmm1, %zmm1, %zmm1
    vpcmpd  $4, %zmm1, %zmm0, %k1
    xor     %eax, %eax              #
    ktestw  %k1, %k1                #
    setne   %al                     #
    vzeroupper
    ret

Reply via email to