https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88798

--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Wojciech Mula from comment #6)
> Hongtao, thank you for your patch and for pinging back! I checked the code
> from this issue against version 11.2.0 (Debian 11.2.0-14), but still, there
> are KMOVQs before performing any bit ops. Here is the output from `gcc -O3
> -march=icelake-server -S`
> 
>     vpcmpub $0, .LC0(%rip), %zmm0, %k0
>     vpcmpub $0, .LC1(%rip), %zmm0, %k1
>     vpcmpub $0, .LC2(%rip), %zmm0, %k2
>     kmovq   %k0, %rcx
>     kmovq   %k1, %rax
>     orq %rcx, %rax
>     kmovq   %k2, %rdx
>     orq %rdx, %rax
>     ret

Oh, Yes, Because of pr101185, mask register is slightly disliked. mask bitwise
instructions are generated only if src and dest are both mask registers.

.i.e

#include <immintrin.h>
__m512i
foo_orq (__m512i a, __m512i b, __m512i c, __m512i d)
{
  __mmask64 m1 = _mm512_cmpeq_epi8_mask (a, b);
  __mmask64 m2 = _mm512_cmpeq_epi8_mask (c, d);
  return _mm512_mask_add_epi8 (c, m1 | m2, a, d);
}

Reply via email to