[Bug rtl-optimization/123322] [16 Regression] worse czero generation for GCC 16 vs GCC 15

cvs-commit at gcc dot gnu.org via Gcc-bugs Tue, 03 Feb 2026 06:27:14 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123322


--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jeff Law <[email protected]>:

https://gcc.gnu.org/g:e7c2fd08ce09fa31240d094d49d5e2a3ada552d1

commit r16-7261-ge7c2fd08ce09fa31240d094d49d5e2a3ada552d1
Author: Jeff Law <[email protected]>
Date:   Tue Feb 3 07:24:53 2026 -0700

    [RISC-V][PR rtl-optimization/123322] Split general conditional moves into
simpler sequences

    This has gone round and round a few times... But I think we're finally at a
    resolution of the regression. There's more work to do for gcc-17 though.

    ---

    At its core this regression is caused by more accurately costing branches
    during if-conversion.  The more accurate cost causes the multi-set path
during
    the first ifcvt pass to (profitably) convert the sequence.  However, it
would
    have been more profitable to have waited for things to be cleaned up and by
the
    2nd ifcvt pass we'd be able to convert using the condzero path which is
*more*
    profitable than the multi-set code (which is what happens in gcc-15).

    One possible solution would be to disable the multi-set ifcvt code during
its
    first pass.  That was evaluated and ultimately rejected because of clear
    undesirable impacts.  It may still be a good thing to do, but certainly not
at
    this stage of gcc-16.

    Another path that was considered was recognizing the sign extension in the
    if-convertable block and avoiding the multi-set path in that case on the
    assumption it'd be cleaned up later and the condzero path would be used. 
That
    felt way too target specific and optimistic about removal of the sign
    extension.

    Daniel looked at if-converting these cases during phiopt; that generally
looks
    like a good thing to do, but had notable undesirable fallout on RISC-V and
I
    was significantly worried it would re-introduce sequences on x86 and
aarch64
    that we just fixed.  ie, solving one regression and creating another.  So
that
    feels like a gcc-17 evaluation.

    After much head-banging it occurred to me that we could tackle this in the
    target with a tiny bit of help from combine.  These conditional ops are 4
    instruction sequences.  The operation, a pair of czeros and an add/ior to
    select between the output of the two czeros.  Combine is pretty
conservative in
    when it chooses to try 4 insn sequences.

    So step #1 was to loosen the restrictions on combine a little bit.  If it
sees
    an IF_THEN_ELSE, then it considers that "good" which in turn allows us to
try 4
    insn combinations a bit more often.

    That exposes the potential for 4 insn combinations and since the IOR/XOR
case
    is a 4->2 sequence fixing those regressions is just a good define_split.  
In
    fact, we can use that splitter for any case where the neutral operand is
zero
    (IOR, XOR, PLUS, MINUS, etc).  We need a second pattern to handle reversed
    operands since we're using code iterators and match_dup for the matching
    operand rather than something like a matching constraint.

    THat still leaves conditional shifts regressing due to the same problem. 
For
    shifts/rotates, the count is in QImode, so we need a different pattern to
    handle those cases, but it has the same basic structure and behavior.

    AND was *still* regressing though.  That would require a 4->3 split which
isn't
    supported by combine.  Given the issues with the other paths attempted, I
    ultimately decided on a define_insn_and_split (boo!).  This shouldn't be
    happening a ton like mvconst_internal, so not great, but workable.  This
also
    required recognizing the pattern and giving it a suitable cost
(COSTS_N_INSNS
    (3)).

    That fixes the regressions.  We're still not generating ideal code for rv64
    when the operands are 32bit quantities and the architecture provides an
    instruction variant that sign extends from 32 to 64 bits (add, subtract,
shifts
    and rotates).  But the code we generate for gcc-16 is better than we were
    generating for gcc-15, it's just not optimal.  So there's a TODO for gcc-17
as
    well.

    This was bootstrapped and regression tested on x86, alpha, riscv, armv7,
hppa,
    maybe others, but those definitely for sure.  It was also tested on the
various
    crosses without regressions.  Waiting on pre-commit testing to render a
verdict
    before going forward.

            PR rtl-optimization/123322
    gcc/
            * combine.cc (try_combine): Consider an IF_THEN_ELSE "good" when
            evaluating if 4 insn combinations should be tried.
            * config/riscv/iterators.md (zero_is_neutral_op): New iterator.
            (zero_is_neutral_op_c): Likewise.
            (any_shift_rotate): Likewise.
            * config/riscv/riscv.cc (riscv_rtx_costs): Recognize the
conditional
            AND RTL and cost is appropriately.
            * config/riscv/zicond.md: Add patterns to rewrite general
conditional
            move sequences into simpler forms.

    gcc/testsuite
            * gcc.target/riscv/pr123322.c: New test.

[Bug rtl-optimization/123322] [16 Regression] worse czero generation for GCC 16 vs GCC 15

Reply via email to