https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88798
--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Wojciech Mula from comment #6) > Hongtao, thank you for your patch and for pinging back! I checked the code > from this issue against version 11.2.0 (Debian 11.2.0-14), but still, there > are KMOVQs before performing any bit ops. Here is the output from `gcc -O3 > -march=icelake-server -S` > > vpcmpub $0, .LC0(%rip), %zmm0, %k0 > vpcmpub $0, .LC1(%rip), %zmm0, %k1 > vpcmpub $0, .LC2(%rip), %zmm0, %k2 > kmovq %k0, %rcx > kmovq %k1, %rax > orq %rcx, %rax > kmovq %k2, %rdx > orq %rdx, %rax > ret Oh, Yes, Because of pr101185, mask register is slightly disliked. mask bitwise instructions are generated only if src and dest are both mask registers. .i.e #include <immintrin.h> __m512i foo_orq (__m512i a, __m512i b, __m512i c, __m512i d) { __mmask64 m1 = _mm512_cmpeq_epi8_mask (a, b); __mmask64 m2 = _mm512_cmpeq_epi8_mask (c, d); return _mm512_mask_add_epi8 (c, m1 | m2, a, d); }