https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88798
Bug ID: 88798
Summary: AVX512BW code does not use bit-operations that work on
mask registers
Product: gcc
Version: 8.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: wojciech_mula at poczta dot onet.pl
Target Milestone: ---
Hi!
AVX512BW-related issue: the C compiler generates superfluous moves from 64-bit
mask registers to 64-bit GPRs and then performs basic bit-ops, while the
AVX512BW supports bit-ops for mask registers (instructions: korq, kandq,
kxorq).
I guess the main reason is C does not define a bit-or for type __mask64
and there's always an implicit conversion to uint64_t.
Below is a sample program compiled for Cannon Lake --- the CPU does have
(at least) AVX512BW, AVX512VBMI and AVX512VL.
---perf.c---
#include <immintrin.h>
#include <stdint.h>
uint64_t any_whitespace(__m512i string) {
return _mm512_cmpeq_epu8_mask(string, _mm512_set1_epi8(' '))
| _mm512_cmpeq_epu8_mask(string, _mm512_set1_epi8('\n'))
| _mm512_cmpeq_epu8_mask(string, _mm512_set1_epi8('\r'));
}
---eof--
$ gcc --version
gcc (Debian 8.2.0-13) 8.2.0
$ gcc perf.c -O3 -march=cannonlake -S
$ cat perf.s # redacted
any_whitespace:
vpcmpub $0, .LC0(%rip), %zmm0, %k1
vpcmpub $0, .LC1(%rip), %zmm0, %k2
vpcmpub $0, .LC2(%rip), %zmm0, %k3
kmovq %k1, %rcx
kmovq %k2, %rdx
orq %rcx, %rdx
kmovq %k3, %rax
orq %rdx, %rax
vzeroupper
ret
I'd rather expect to get something like:
any_whitespace:
vpcmpub $0, .LC0(%rip), %zmm0, %k1
vpcmpub $0, .LC1(%rip), %zmm0, %k2
vpcmpub $0, .LC2(%rip), %zmm0, %k3
korq %k1, %k2, %k1
korq %k1, %k3, %k3
kmovq %k3, %rax
vzeroupper
ret
best regards
Wojciech