https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96246

            Bug ID: 96246
           Summary: [AVX512] unefficient code generatation for vpblendm*
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: crazylht at gmail dot com
  Target Milestone: ---
            Target: i386, x86-64

cat test.c

---
typedef int v8si __attribute__ ((__vector_size__ (32)));
v8si
foo (v8si a, v8si b, v8si c, v8si d)
{
    return a > b ? c : d;
}
---

gcc11 -O2 -mavx512f -mavx512vl

gcc generate
---
        vpcmpd  $6, %ymm1, %ymm0, %k1
        vmovdqa32       %ymm2, %ymm3{%k1}
        vmovdqa %ymm3, %ymm0 
        ret
---

could be optimized to

---
        vpcmpd  $6, %ymm1, %ymm0, %k1
        vpblendmd       %ymm2, %ymm3, %ymm0 {%k1}
---

gcc failed to generate optimal code because in sse.md

(define_insn "<avx512>_load<mode>_mask have the same pattern as 
(define_insn "<avx512>_blendm<mode>" and existed early in the file, rtx pattern
match is always recognized as <avx512>_load<mode>_mask which missed opportunity
in pass_reload, and can't combine to <avx512>_blendm<mode> after reload.

Reply via email to