On 2/3/22 22:00, Roger Sayle wrote:

This patch addresses the "increased register pressure" regression on
nvptx-none caused by my change to transition the backend to a
STORE_FLAG_VALUE = 1 target.  This improved code generation for the
more common case of producing 0/1 Boolean values, but unfortunately
made things marginally worse when a 0/-1 mask value is desired.
Unfortunately, nvptx kernels are extremely sensitive to changes in
register usage, which was observable in the reported PR.

This patch provides optimizations for -(cond ? 1 : 0), effectively
simplify this into cond ? -1 : 0, where these ternary operators are
provided by nvptx's selp instruction, and for the specific case of
SImode, using (restoring) nvptx's "set" instruction (which avoids
the need for a predicate register).

This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
with a "make" and "make -k check" with no new failures.  Unfortunately,
the exact register usage of a nvptx kernel depends upon the version of
the Cuda drivers being used (and the hardware), but I believe this
change should resolve the PR (for Thomas) by improving code generation
for the cases that regressed.  Ok for mainline?



LGTM, applied.

Thanks,
- Tom

2022-02-03  Roger Sayle  <ro...@nextmovesoftware.com>

gcc/ChangeLog
        PR target/104345
        * config/nvptx/nvptx.md (sel_true<mode>): Fix indentation.
        (sel_false<mode>): Likewise.
        (define_code_iterator eqne): New code iterator for EQ and NE.
        (*selp<mode>_neg_<code>): New define_insn_and_split to optimize
        the negation of a selp instruction.
        (*selp<mode>_not_<code>): New define_insn_and_split to optimize
        the bitwise not of a selp instruction.
        (*setcc_int<mode>): Use set instruction for neg:SI of a selp.

gcc/testsuite/ChangeLog
        PR target/104345
        * gcc.target/nvptx/neg-selp.c: New test case.


Thanks in advance,
Roger
--

Reply via email to