https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104345

--- Comment #10 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tom de Vries <vr...@gcc.gnu.org>:

https://gcc.gnu.org/g:9bacd7af2e3bba9ddad17e7de4e2d299419d819d

commit r12-7167-g9bacd7af2e3bba9ddad17e7de4e2d299419d819d
Author: Roger Sayle <ro...@nextmovesoftware.com>
Date:   Fri Feb 4 04:13:53 2022 +0100

    PR target/104345: Use nvptx "set" instruction for cond ? -1 : 0

    This patch addresses the "increased register pressure" regression on
    nvptx-none caused by my change to transition the backend to a
    STORE_FLAG_VALUE = 1 target.  This improved code generation for the
    more common case of producing 0/1 Boolean values, but unfortunately
    made things marginally worse when a 0/-1 mask value is desired.
    Unfortunately, nvptx kernels are extremely sensitive to changes in
    register usage, which was observable in the reported PR.

    This patch provides optimizations for -(cond ? 1 : 0), effectively
    simplify this into cond ? -1 : 0, where these ternary operators are
    provided by nvptx's selp instruction, and for the specific case of
    SImode, using (restoring) nvptx's "set" instruction (which avoids
    the need for a predicate register).

    This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
    with a "make" and "make -k check" with no new failures.  Unfortunately,
    the exact register usage of a nvptx kernel depends upon the version of
    the Cuda drivers being used (and the hardware), but I believe this
    change should resolve the PR (for Thomas) by improving code generation
    for the cases that regressed.

    gcc/ChangeLog:

            PR target/104345
            * config/nvptx/nvptx.md (sel_true<mode>): Fix indentation.
            (sel_false<mode>): Likewise.
            (define_code_iterator eqne): New code iterator for EQ and NE.
            (*selp<mode>_neg_<code>): New define_insn_and_split to optimize
            the negation of a selp instruction.
            (*selp<mode>_not_<code>): New define_insn_and_split to optimize
            the bitwise not of a selp instruction.
            (*setcc_int<mode>): Use set instruction for neg:SI of a selp.

    gcc/testsuite/ChangeLog:

            PR target/104345
            * gcc.target/nvptx/neg-selp.c: New test case.

Reply via email to