On 12/26/2025 1:32 PM, Andrew Pinski wrote:
On Fri, Dec 26, 2025 at 11:17 AM Jeffrey Law <[email protected]> wrote:
On 12/23/2025 11:54 PM, Andrew Pinski wrote:
On Tue, Dec 23, 2025 at 8:33 PM Jeffrey Law <[email protected]> wrote:
On 12/23/2025 6:44 PM, Andrew Pinski wrote:
I noticed that on x86_64 and aarch64, noce_try_cond_zero_arith
would produce worse code than noce_try_cmove_arith.
So we should do noce_try_cond_zero_arith last instead
of before noce_try_cmove_arith.
Pushed as obvious after bootstrap/test on x86_64-linux-gnu.
Also checked to make sure riscv testcases still work.
gcc/ChangeLog:
* ifcvt.cc (noce_process_if_block): Move noce_try_cond_zero_arith
last.
Please no. We very much want to use condzero_arith rather than cmove
based things -- that would be pretty bad in general for RISC-V. We
really should dive into why the code isn't as good as we'd like on other
patforms.
I looked and I noticed noce_try_cmove_arith fails for riscv for most
(all?) of the testcases I tried.
We may not have good coverage here. But it was a huge source of
performance issues with gcc-15 -- way too many generalized conditional
moves that should have been condzero style sequences.
I noticed in some cases noce_convert_multiple_sets happened even
before noce_try_cond_zero_arith had a chance to do its thing.
The code generation from noce_convert_multiple_sets is worse for riscv
for sure and noce_convert_multiple_sets happens even before
noce_try_cond_zero_arith and noce_try_cmove_arith could even happen.
Note looking into cases where noce_try_cond_zero_arith fails (and
noce_try_cmove_arith also fails) on riscv I find that the check:
```
if (!REG_P (XEXP (cond, 0)) || !rtx_equal_p (XEXP (cond, 1), const0_rtx))
return false;
```
is too restrictive but that is a different story.
But that's capturing the key concept. Namely that we want to target a
conditional zero idiom rather than a conditional move idiom. RISC-V has
instructions for the former. The latter requires a pair of conditional
zeros with opposite polarity on the test *and* an additional instruction
to select one of the outputs from the pair of conditional zeros.
I understand there are riscv instructions which handle `a!=0?b:c` but
the reason why I say it was too restrive was
because I noticed if we had:
```
long
test_ADD_ceqz_x (long x, long z, long c)
{
if (c < 10)
x = x + z;
return x;
}
```
This would not cause a czero.eqz to be used.
I assumed we wanted here:
```
slti a2,a2,10
czero.eqz a2,a1,a2
add a0,a0,a2
ret
```
But we get the branch version.
If I remove the check for REG/0, then I get the above.
Yea. I'll need to look at why that gets converted with our internal
tree, but not with the trunk. Internally I get:
li a5,9
sgt a2,a2,a5
czero.nez a2,a1,a2
add a0,a0,a2
Not perfect, but no branch ;-)
To get it if-converted upstream we need to crank up the branch cost and
we get a pesky conditional move based sequence:
li a5,9
sgt a2,a2,a5
add a1,a0,a1
czero.nez a1,a1,a2
czero.eqz a0,a0,a2
add a0,a0,a1
The combination of the two would tend to suggest it's mess of code we
have internally for condzero_arith that's capturing more cases. Not a
huge surprise given the focus we put on that last spring. As you said,
it's a separate issue and we definitely want to address.
One idea is for the `AND` case, try to see if the conditional move
will handle `cond?a:-1` first. That would fixup the one case which
seems like the only really bad code generation.
I think having the other operation before or after the cmove should be
ok either way and would fix up a different regression.
That's basically what I was thinking to test once from a costing
standpoint and caching the result. Essentially having condzero_arith
ask the target (via generating RTL and querying how the target handles
it, either from a sequence length, cost, whatever) what's preferred. I
was going to use the AND case to drive the decision since that seems to
be where the pain point shows up, and use the result of the AND as a
proxy for all the cases where we want to avoid condzero_arith and
instead utilize condmove_arith.
Jeff