On Fri, Dec 26, 2025 at 11:17 AM Jeffrey Law <[email protected]> wrote:
>
>
> On 12/23/2025 11:54 PM, Andrew Pinski wrote:
> > On Tue, Dec 23, 2025 at 8:33 PM Jeffrey Law <[email protected]> wrote:
> >>
> >> On 12/23/2025 6:44 PM, Andrew Pinski wrote:
> >>> I noticed that on x86_64 and aarch64, noce_try_cond_zero_arith
> >>> would produce worse code than noce_try_cmove_arith.
> >>> So we should do noce_try_cond_zero_arith last instead
> >>> of before noce_try_cmove_arith.
> >>>
> >>> Pushed as obvious after bootstrap/test on x86_64-linux-gnu.
> >>> Also checked to make sure riscv testcases still work.
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>> * ifcvt.cc (noce_process_if_block): Move noce_try_cond_zero_arith
> >>> last.
> >> Please no. We very much want to use condzero_arith rather than cmove
> >> based things -- that would be pretty bad in general for RISC-V. We
> >> really should dive into why the code isn't as good as we'd like on other
> >> patforms.
> > I looked and I noticed noce_try_cmove_arith fails for riscv for most
> > (all?) of the testcases I tried.
> We may not have good coverage here. But it was a huge source of
> performance issues with gcc-15 -- way too many generalized conditional
> moves that should have been condzero style sequences.
> > I noticed in some cases noce_convert_multiple_sets happened even
> > before noce_try_cond_zero_arith had a chance to do its thing.
> > The code generation from noce_convert_multiple_sets is worse for riscv
> > for sure and noce_convert_multiple_sets happens even before
> > noce_try_cond_zero_arith and noce_try_cmove_arith could even happen.
> >
> > Note looking into cases where noce_try_cond_zero_arith fails (and
> > noce_try_cmove_arith also fails) on riscv I find that the check:
> > ```
> > if (!REG_P (XEXP (cond, 0)) || !rtx_equal_p (XEXP (cond, 1), const0_rtx))
> > return false;
> > ```
> > is too restrictive but that is a different story.
>
> But that's capturing the key concept. Namely that we want to target a
> conditional zero idiom rather than a conditional move idiom. RISC-V has
> instructions for the former. The latter requires a pair of conditional
> zeros with opposite polarity on the test *and* an additional instruction
> to select one of the outputs from the pair of conditional zeros.
I understand there are riscv instructions which handle `a!=0?b:c` but
the reason why I say it was too restrive was
because I noticed if we had:
```
long
test_ADD_ceqz_x (long x, long z, long c)
{
if (c < 10)
x = x + z;
return x;
}
```
This would not cause a czero.eqz to be used.
I assumed we wanted here:
```
slti a2,a2,10
czero.eqz a2,a1,a2
add a0,a0,a2
ret
```
But we get the branch version.
If I remove the check for REG/0, then I get the above.
>
>
> > Now on to the question about other targets.
> > The question here is `a OP= (cond ? 0 : b)` better than `a = (cond ?
> > a : a OP b)`.
> > LLVM seems to always do as `a OP= (cond ? 0 : b)`. (except for & where
> > they do `a &= cond ? -1 : b`).
> > I think both for aarch64 are ok, for x86_64, I saw a notice that doing
> > the conditional move before the operation is better on some
> > micro-architectures
>
> The forms with explicit zeros are definitely preferred for RISC-V as
> those correspond to czero instructions. That is precisely the form that
> condzero_arith is targeting.
>
> Let's take a conditional shift by 6 (since that's important for one of
> the spec2017 benchmarks, I forget which).
>
> Good code for riscv would look like
>
> li t0,6
>
> czero.eqz t0,t0,<condreg>
>
> sll dest,src,t0
>
>
> Contrast to a conditional move sequence which will look like:
>
>
> slli tmp1,src,6
>
> czero.eqz tmp1,tmp,<condreg>
>
> czero.nez tmp2,src,<condreg>
>
> add dest,tmp1,tmp2
>
>
> Or worse yet, branching...
>
>
> > BUT the `&` case is worse without this patch.
> > testcase:
> > ```
> > long f(long a, long b, long c)
> > {
> > return a ? b : b & c;
> > }
> > ```
> >
> > In GCC 15 (and with this patch) GCC produces on targets with cmov
> > (aarch64, x86_64 is similar):
> > ```
> > and x2, x1, x2
> > cmp x0, 0
> > csel x0, x2, x1, eq
> > ```
> >
> > Without we get:
> > ```
> > cmp x0, 0
> > csel x0, x1, xzr, ne
> > and x1, x1, x2
> > orr x0, x1, x0
> > ```
>
> So what we have is targets which want two different approaches to the
> basic code generation strategy. Often we'd look to tackle this with a
> cost function. We could do that, but it'd mean one target is going to
> have to have combiner patterns (or simplify-rtx adjustments) for the
> case where the less efficient sequence works, but could be improved.
> Those are going to be *fugly* -- been there and you can see the evidence
> in zicond.md IIRC (assuming I upstreamed that).
>
> We can't really key on an optab as the RISC-V port claims to support a
> generalized conditional move via an expander that handles the
> generalized case, generating the appropriate code to handle the limited
> conditions as well as canonicalization of operands. Having that pattern
> isn't ideal, but it really helps as a fallback path for ifcvt
> transformations.
>
> I guess we could synthesize the two styles once, cost them, then use
> that result to guide expansions going forward. ie, a prefer_czero vs
> perfer_cmove kind property then test that in the czero_arith path,
> punting to the cmove_arith path if there's no benefit to the czero form
> (or active harm as we see above).
>
> Other ideas?
One idea is for the `AND` case, try to see if the conditional move
will handle `cond?a:-1` first. That would fixup the one case which
seems like the only really bad code generation.
I think having the other operation before or after the cmove should be
ok either way and would fix up a different regression.
Also that seems like what LLVM does and seems like a good idea in general.
Thanks,
Andrew
>
>
> jeff
>