On 12/26/2025 1:32 PM, Andrew Pinski wrote:
On Fri, Dec 26, 2025 at 11:17 AM Jeffrey Law <[email protected]> wrote:

On 12/23/2025 11:54 PM, Andrew Pinski wrote:
On Tue, Dec 23, 2025 at 8:33 PM Jeffrey Law <[email protected]> wrote:
On 12/23/2025 6:44 PM, Andrew Pinski wrote:
I noticed that on x86_64 and aarch64, noce_try_cond_zero_arith
would produce worse code than noce_try_cmove_arith.
So we should do noce_try_cond_zero_arith last instead
of before noce_try_cmove_arith.

Pushed as obvious after bootstrap/test on x86_64-linux-gnu.
Also checked to make sure riscv testcases still work.

gcc/ChangeLog:

        * ifcvt.cc (noce_process_if_block): Move noce_try_cond_zero_arith
        last.
Please no.   We very much want to use condzero_arith rather than cmove
based things -- that would be pretty bad in general for RISC-V.  We
really should dive into why the code isn't as good as we'd like on other
patforms.
I looked and I noticed noce_try_cmove_arith fails for riscv for most
(all?) of the testcases I tried.
We may not have good coverage here.   But it was a huge source of
performance issues with gcc-15 -- way too many generalized conditional
moves that should have been condzero style sequences.
I noticed in some cases noce_convert_multiple_sets happened even
before noce_try_cond_zero_arith had a chance to do its thing.
The code generation from noce_convert_multiple_sets is worse for riscv
for sure and noce_convert_multiple_sets happens even before
noce_try_cond_zero_arith and noce_try_cmove_arith could even happen.

Note looking into cases where noce_try_cond_zero_arith fails (and
noce_try_cmove_arith also fails) on riscv I find that the check:
```
    if (!REG_P (XEXP (cond, 0)) || !rtx_equal_p (XEXP (cond, 1), const0_rtx))
      return false;
```
is too restrictive but that is a different story.
But that's capturing the key concept.  Namely that we want to target a
conditional zero idiom rather than a conditional move idiom.  RISC-V has
instructions for the former.  The latter requires a pair of conditional
zeros with opposite polarity on the test *and* an additional instruction
to select one of the outputs from the pair of conditional zeros.
I understand there are riscv instructions which handle `a!=0?b:c` but
the reason why I say it was too restrive was
because I noticed if we had:
```
long
test_ADD_ceqz_x (long x, long z, long c)
{
   if (c < 10)
     x = x + z;

   return x;
}
```
This would not cause a czero.eqz to be used.
I assumed we wanted here:
```
         slti    a2,a2,10
         czero.eqz       a2,a1,a2
         add     a0,a0,a2
         ret
```
But we get the branch version.
If I remove the check for REG/0, then I get the above.

Yea.   I'll need to look at why that gets converted with our internal tree, but not with the trunk.   Internally I get:

        li      a5,9
        sgt     a2,a2,a5
        czero.nez       a2,a1,a2
        add     a0,a0,a2

Not perfect, but no branch ;-)


To get it if-converted upstream we need to crank up the branch cost and we get a pesky conditional move based sequence:

        li      a5,9
        sgt     a2,a2,a5
        add     a1,a0,a1
        czero.nez       a1,a1,a2
        czero.eqz       a0,a0,a2
        add     a0,a0,a1

The combination of the two would tend to suggest it's mess of code we have internally for condzero_arith that's capturing more cases.  Not a huge surprise given the focus we put on that last spring.   As you said, it's a separate issue and we definitely want to address.

One idea is for the `AND` case, try to see if the conditional move
will handle `cond?a:-1` first. That would fixup the one case which
seems like the only really bad code generation.
I think having the other operation before or after the cmove should be
ok either way and would fix up a different regression.

That's basically what I was thinking to test once from a costing standpoint and caching the result.   Essentially having condzero_arith ask the target (via generating RTL and querying how the target handles it, either from a sequence length, cost, whatever) what's preferred.  I was going to use the AND case to drive the decision since that seems to be where the pain point shows up, and use the result of the AND as a proxy for all the cases where we want to avoid condzero_arith and instead utilize condmove_arith.

Jeff

Reply via email to