https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105791

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sa...@gcc.gnu.org>:

https://gcc.gnu.org/g:c4320bde42c6497b701e2e6b8f1c5069bed19818

commit r13-998-gc4320bde42c6497b701e2e6b8f1c5069bed19818
Author: Roger Sayle <ro...@nextmovesoftware.com>
Date:   Tue Jun 7 07:49:40 2022 +0100

    Recognize vpcmov in combine with -mxop on x86.

    By way of an apology for causing PR target/105791, where I'd overlooked
    the need to support V1TImode in TARGET_XOP's vpcmov instruction, this
    patch further improves support for TARGET_XOP's vpcmov instruction, by
    recognizing it in combine.

    Currently, the test case:

    typedef int v4si __attribute__ ((vector_size (16)));
    v4si foo(v4si c, v4si t, v4si f)
    {
        return (c&t)|(~c&f);
    }

    on x86_64 with -O2 -mxop generates:
            vpxor   %xmm2, %xmm1, %xmm1
            vpand   %xmm0, %xmm1, %xmm1
            vpxor   %xmm2, %xmm1, %xmm0
            ret

    but with this patch now generates:
            vpcmov  %xmm0, %xmm2, %xmm1, %xmm0
            ret

    On its own, the new combine splitter works fine on TARGET_64BIT, but
    alas with -m32 combine incorrectly thinks the replacement instruction
    is more expensive, as IF_THEN_ELSE isn't currently/correctly handled
    in ix86_rtx_costs.  So to avoid the need for a target selector in the
    new tescase, I've updated ix86_rtx_costs to report that AMD's vpcmov
    has a latency of two cycles [it's now an obsolete instruction set
    extension and there's unlikely to ever be a processor where this
    instruction has a different timing], and while there I also added
    rtx_costs for x86_64's integer conditional move instructions (which
    have single cycle latency).

    2022-06-07  Roger Sayle  <ro...@nextmovesoftware.com>

    gcc/ChangeLog
            * config/i386/i386.cc (ix86_rtx_costs): Add a new case for
            IF_THEN_ELSE, and provide costs for TARGET_XOP's vpcmov and
            TARGET_CMOVE's (scalar integer) conditional moves.
            * config/i386/sse.md (define_split): Recognize XOP's vpcmov
            from its equivalent (canonical) pxor;pand;pxor sequence.

    gcc/testsuite/ChangeLog
            * gcc.target/i386/xop-pcmov3.c: New test case.

Reply via email to