https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88798

            Bug ID: 88798
           Summary: AVX512BW code does not use bit-operations that work on
                    mask registers
           Product: gcc
           Version: 8.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: wojciech_mula at poczta dot onet.pl
  Target Milestone: ---

Hi!

AVX512BW-related issue: the C compiler generates superfluous moves from 64-bit
mask registers to 64-bit GPRs and then performs basic bit-ops, while the
AVX512BW supports bit-ops for mask registers (instructions: korq, kandq,
kxorq).

I guess the main reason is C does not define a bit-or for type __mask64
and there's always an implicit conversion to uint64_t.

Below is a sample program compiled for Cannon Lake --- the CPU does have
(at least) AVX512BW, AVX512VBMI and AVX512VL.

---perf.c---
#include <immintrin.h>
#include <stdint.h>

uint64_t any_whitespace(__m512i string) {
    return _mm512_cmpeq_epu8_mask(string, _mm512_set1_epi8(' '))
         | _mm512_cmpeq_epu8_mask(string, _mm512_set1_epi8('\n'))
         | _mm512_cmpeq_epu8_mask(string, _mm512_set1_epi8('\r'));
}
---eof--

$ gcc --version
gcc (Debian 8.2.0-13) 8.2.0

$ gcc perf.c -O3 -march=cannonlake -S
$ cat perf.s # redacted
any_whitespace:
        vpcmpub $0, .LC0(%rip), %zmm0, %k1
        vpcmpub $0, .LC1(%rip), %zmm0, %k2
        vpcmpub $0, .LC2(%rip), %zmm0, %k3
        kmovq   %k1, %rcx
        kmovq   %k2, %rdx
        orq     %rcx, %rdx
        kmovq   %k3, %rax
        orq     %rdx, %rax
        vzeroupper
        ret

I'd rather expect to get something like:

any_whitespace:
        vpcmpub $0, .LC0(%rip), %zmm0, %k1
        vpcmpub $0, .LC1(%rip), %zmm0, %k2
        vpcmpub $0, .LC2(%rip), %zmm0, %k3
        korq    %k1, %k2, %k1
        korq    %k1, %k3, %k3
        kmovq   %k3, %rax
        vzeroupper
        ret

best regards
Wojciech

Reply via email to