https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96246
Bug ID: 96246 Summary: [AVX512] unefficient code generatation for vpblendm* Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com Target Milestone: --- Target: i386, x86-64 cat test.c --- typedef int v8si __attribute__ ((__vector_size__ (32))); v8si foo (v8si a, v8si b, v8si c, v8si d) { return a > b ? c : d; } --- gcc11 -O2 -mavx512f -mavx512vl gcc generate --- vpcmpd $6, %ymm1, %ymm0, %k1 vmovdqa32 %ymm2, %ymm3{%k1} vmovdqa %ymm3, %ymm0 ret --- could be optimized to --- vpcmpd $6, %ymm1, %ymm0, %k1 vpblendmd %ymm2, %ymm3, %ymm0 {%k1} --- gcc failed to generate optimal code because in sse.md (define_insn "<avx512>_load<mode>_mask have the same pattern as (define_insn "<avx512>_blendm<mode>" and existed early in the file, rtx pattern match is always recognized as <avx512>_load<mode>_mask which missed opportunity in pass_reload, and can't combine to <avx512>_blendm<mode> after reload.