[Bug target/103750] [i386] GCC schedules KMOV instructions that destroys performance in loop

cvs-commit at gcc dot gnu.org via Gcc-bugs Sun, 08 Jun 2025 19:22:14 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103750


--- Comment #22 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuho...@gcc.gnu.org>:

https://gcc.gnu.org/g:cdfa5fe03512f7ac5a293480f634df68fc973060

commit r16-1298-gcdfa5fe03512f7ac5a293480f634df68fc973060
Author: liuhongt <hongtao....@intel.com>
Date:   Tue Jun 3 14:12:23 2025 +0800

    Also handle avx512 kmask & immediate 15 or 3 when VF is 4/2.

    like r16-105-g599bca27dc37b3, the patch handles redunduant clean up of
    upper-bits for maskload.
    .i.e
    Successfully matched this instruction:
    (set (reg:V4DF 175)
        (vec_merge:V4DF (unspec:V4DF [
                    (mem:V4DF (plus:DI (reg/v/f:DI 155 [ b ])
                            (reg:DI 143 [ ivtmp.56 ])) [1  S32 A64])
                ] UNSPEC_MASKLOAD)
            (const_vector:V4DF [
                    (const_double:DF 0.0 [0x0.0p+0]) repeated x4
                ])
            (and:QI (reg:QI 125 [ mask__29.16 ])
                (const_int 15 [0xf]))))

    For maskstore, looks like it's already optimal(at least I can't make a
    testcase).
    So The patch only hanldes maskload.

    gcc/ChangeLog:

            PR target/103750
            * config/i386/i386.cc (ix86_rtx_costs): Adjust rtx_cost for
            maskload.
            * config/i386/sse.md (*<avx512>_load<mode>mask_and15): New
            define_insn_and_split.
            (*<avx512>_load<mode>mask_and3): Ditto.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/avx512f-pr103750-3.c: New test.

[Bug target/103750] [i386] GCC schedules KMOV instructions that destroys performance in loop

Reply via email to