https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85832
Bug ID: 85832
Summary: [AVX512] possible shorter code when comparing with
vector of zeros
Product: gcc
Version: 7.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: wojciech_mula at poczta dot onet.pl
Target Milestone: ---
Consider this simple function, which yields mask fors non-zero elements:
---cat cmp.c---
#include <immintrin.h>
int fun(__m512i x) {
return _mm512_cmpeq_epi32_mask(x, _mm512_setzero_si512());
}
---eof
$ gcc --version
gcc (Debian 7.3.0-16) 7.3.0
$ gcc -O2 -S -mavx512f cmp.c && cat cmp.s
fun:
vpxord %zmm1, %zmm1, %zmm1 # <<< HERE
vpcmpeqd %zmm1, %zmm0, %k1 # <<<
kmovw %k1, %eax
vzeroupper
ret
Also 8.1.0 generates the same code (as checked on godbolt.org).
The pair of instructions VPXORD/VPCMPEQD can be replaced with single
VPTESTMD %zmm0, %zmm0. VPTESTMD performs k1 := zmm0 AND zmm0, so to
compare zmm0 with zeros it's sufficient.