https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70314
Bug ID: 70314 Summary: AVX512 not using kandw to combine comparison results Product: gcc Version: 6.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: glisse at gcc dot gnu.org Target Milestone: --- Target: x86_64-*-* This comes from PR 68714 (comment 7), there are more details and suggestions there. typedef int T __attribute__((vector_size(64))); T f(T a,T b,T c,T d){ return (a<b)&(c<d); } we generate (-march=skylake-avx512): _3 = VEC_COND_EXPR <a_1(D) < b_2(D), { -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }>; _6 = VEC_COND_EXPR <c_4(D) < d_5(D), { -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }>; _7 = _3 & _6; return _7; yielding this code: vpcmpgtd %zmm0, %zmm1, %k1 vpternlogd $0xFF, %zmm4, %zmm4, %zmm4 vmovdqa32 %zmm4, %zmm0{%k1}{z} vpcmpgtd %zmm2, %zmm3, %k1 vmovdqa32 %zmm4, %zmm2{%k1}{z} vpandd %zmm2, %zmm0, %zmm0 We perform the bit_and on the mask type, whereas it would be better to do it on the vector boolean type and use 'kandw'.