https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88798
Bug ID: 88798 Summary: AVX512BW code does not use bit-operations that work on mask registers Product: gcc Version: 8.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wojciech_mula at poczta dot onet.pl Target Milestone: --- Hi! AVX512BW-related issue: the C compiler generates superfluous moves from 64-bit mask registers to 64-bit GPRs and then performs basic bit-ops, while the AVX512BW supports bit-ops for mask registers (instructions: korq, kandq, kxorq). I guess the main reason is C does not define a bit-or for type __mask64 and there's always an implicit conversion to uint64_t. Below is a sample program compiled for Cannon Lake --- the CPU does have (at least) AVX512BW, AVX512VBMI and AVX512VL. ---perf.c--- #include <immintrin.h> #include <stdint.h> uint64_t any_whitespace(__m512i string) { return _mm512_cmpeq_epu8_mask(string, _mm512_set1_epi8(' ')) | _mm512_cmpeq_epu8_mask(string, _mm512_set1_epi8('\n')) | _mm512_cmpeq_epu8_mask(string, _mm512_set1_epi8('\r')); } ---eof-- $ gcc --version gcc (Debian 8.2.0-13) 8.2.0 $ gcc perf.c -O3 -march=cannonlake -S $ cat perf.s # redacted any_whitespace: vpcmpub $0, .LC0(%rip), %zmm0, %k1 vpcmpub $0, .LC1(%rip), %zmm0, %k2 vpcmpub $0, .LC2(%rip), %zmm0, %k3 kmovq %k1, %rcx kmovq %k2, %rdx orq %rcx, %rdx kmovq %k3, %rax orq %rdx, %rax vzeroupper ret I'd rather expect to get something like: any_whitespace: vpcmpub $0, .LC0(%rip), %zmm0, %k1 vpcmpub $0, .LC1(%rip), %zmm0, %k2 vpcmpub $0, .LC2(%rip), %zmm0, %k3 korq %k1, %k2, %k1 korq %k1, %k3, %k3 kmovq %k3, %rax vzeroupper ret best regards Wojciech