https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70314
Bug ID: 70314
Summary: AVX512 not using kandw to combine comparison results
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: enhancement
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: glisse at gcc dot gnu.org
Target Milestone: ---
Target: x86_64-*-*
This comes from PR 68714 (comment 7), there are more details and suggestions
there.
typedef int T __attribute__((vector_size(64)));
T f(T a,T b,T c,T d){
return (a<b)&(c<d);
}
we generate (-march=skylake-avx512):
_3 = VEC_COND_EXPR <a_1(D) < b_2(D), { -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
}>;
_6 = VEC_COND_EXPR <c_4(D) < d_5(D), { -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
}>;
_7 = _3 & _6;
return _7;
yielding this code:
vpcmpgtd %zmm0, %zmm1, %k1
vpternlogd $0xFF, %zmm4, %zmm4, %zmm4
vmovdqa32 %zmm4, %zmm0{%k1}{z}
vpcmpgtd %zmm2, %zmm3, %k1
vmovdqa32 %zmm4, %zmm2{%k1}{z}
vpandd %zmm2, %zmm0, %zmm0
We perform the bit_and on the mask type, whereas it would be better to do it on
the vector boolean type and use 'kandw'.