https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70314

            Bug ID: 70314
           Summary: AVX512 not using kandw to combine comparison results
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---
            Target: x86_64-*-*

This comes from PR 68714 (comment 7), there are more details and suggestions
there.

typedef int T __attribute__((vector_size(64)));
T f(T a,T b,T c,T d){
  return (a<b)&(c<d);
}

we generate (-march=skylake-avx512):

  _3 = VEC_COND_EXPR <a_1(D) < b_2(D), { -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
}>;
  _6 = VEC_COND_EXPR <c_4(D) < d_5(D), { -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
}>;
  _7 = _3 & _6;
  return _7;

yielding this code:

        vpcmpgtd        %zmm0, %zmm1, %k1
        vpternlogd      $0xFF, %zmm4, %zmm4, %zmm4
        vmovdqa32       %zmm4, %zmm0{%k1}{z}
        vpcmpgtd        %zmm2, %zmm3, %k1
        vmovdqa32       %zmm4, %zmm2{%k1}{z}
        vpandd  %zmm2, %zmm0, %zmm0

We perform the bit_and on the mask type, whereas it would be better to do it on
the vector boolean type and use 'kandw'.

Reply via email to