On 12/23/2025 11:54 PM, Andrew Pinski wrote:
On Tue, Dec 23, 2025 at 8:33 PM Jeffrey Law <[email protected]> wrote:

On 12/23/2025 6:44 PM, Andrew Pinski wrote:
I noticed that on x86_64 and aarch64, noce_try_cond_zero_arith
would produce worse code than noce_try_cmove_arith.
So we should do noce_try_cond_zero_arith last instead
of before noce_try_cmove_arith.

Pushed as obvious after bootstrap/test on x86_64-linux-gnu.
Also checked to make sure riscv testcases still work.

gcc/ChangeLog:

       * ifcvt.cc (noce_process_if_block): Move noce_try_cond_zero_arith
       last.
Please no.   We very much want to use condzero_arith rather than cmove
based things -- that would be pretty bad in general for RISC-V.  We
really should dive into why the code isn't as good as we'd like on other
patforms.
I looked and I noticed noce_try_cmove_arith fails for riscv for most
(all?) of the testcases I tried.
We may not have good coverage here.   But it was a huge source of performance issues with gcc-15 -- way too many generalized conditional moves that should have been condzero style sequences.
I noticed in some cases noce_convert_multiple_sets happened even
before noce_try_cond_zero_arith had a chance to do its thing.
The code generation from noce_convert_multiple_sets is worse for riscv
for sure and noce_convert_multiple_sets happens even before
noce_try_cond_zero_arith and noce_try_cmove_arith could even happen.

Note looking into cases where noce_try_cond_zero_arith fails (and
noce_try_cmove_arith also fails) on riscv I find that the check:
```
   if (!REG_P (XEXP (cond, 0)) || !rtx_equal_p (XEXP (cond, 1), const0_rtx))
     return false;
```
is too restrictive but that is a different story.

But that's capturing the key concept.  Namely that we want to target a conditional zero idiom rather than a conditional move idiom.  RISC-V has instructions for the former.  The latter requires a pair of conditional zeros with opposite polarity on the test *and* an additional instruction to select one of the outputs from the pair of conditional zeros.


Now on to the question about other targets.
The question here is `a OP= (cond ? 0 : b)` better than  `a = (cond ?
a : a OP b)`.
LLVM seems to always do as `a OP= (cond ? 0 : b)`. (except for & where
they do `a &= cond ? -1 : b`).
I think both for aarch64 are ok, for x86_64, I saw a notice that doing
the conditional move before the operation is better on some
micro-architectures

The forms with explicit zeros are definitely preferred for RISC-V as those correspond to czero instructions.  That is precisely the form that condzero_arith is targeting.

Let's take a conditional shift by 6 (since that's important for one of the spec2017 benchmarks, I forget which).

Good code for riscv would look like

    li t0,6

    czero.eqz t0,t0,<condreg>

    sll dest,src,t0


Contrast to a conditional move sequence which will look like:


    slli tmp1,src,6

    czero.eqz tmp1,tmp,<condreg>

    czero.nez tmp2,src,<condreg>

    add dest,tmp1,tmp2


Or worse yet, branching...


BUT the `&` case is worse without this patch.
testcase:
```
long f(long a, long b, long c)
{
   return a ? b : b & c;
}
```

In GCC 15 (and with this patch) GCC produces on targets with cmov
(aarch64, x86_64 is similar):
```
         and     x2, x1, x2
         cmp     x0, 0
         csel    x0, x2, x1, eq
```

Without we get:
```
         cmp     x0, 0
         csel    x0, x1, xzr, ne
         and     x1, x1, x2
         orr     x0, x1, x0
```

So what we have is targets which want two different approaches to the basic code generation strategy.  Often we'd look to tackle this with a cost function.  We could do that, but it'd mean one target is going to have to have combiner patterns (or simplify-rtx adjustments) for the case where the less efficient sequence works, but could be improved.  Those are going to be *fugly* -- been there and you can see the evidence in zicond.md IIRC (assuming I upstreamed that).

We can't really key on an optab as the RISC-V port claims to support a generalized conditional move via an expander that handles the generalized case, generating the appropriate code to handle the limited conditions as well as canonicalization of operands.  Having that pattern isn't ideal, but it really helps as a fallback path for ifcvt transformations.

I guess we could synthesize the two styles once, cost them, then use that result to guide expansions going forward.  ie, a prefer_czero vs perfer_cmove kind property then test that in the czero_arith path, punting to the cmove_arith path if there's no benefit to the czero form (or active harm as we see above).

Other ideas?


jeff

Reply via email to