On 10/30/23 01:25, Fei Gao wrote:
Conditional add, if zero
rd = (rc == 0) ? (rs1 + rs2) : rs1
-->
czero.nez rd, rs2, rc
add rd, rs1, rd
Conditional add, if non-zero
rd = (rc != 0) ? (rs1 + rs2) : rs1
-->
czero.eqz rd, rs2, rc
add rd, rs1, rd
Co-authored-by: Xiao Zeng<zengx...@eswincomputing.com>
gcc/ChangeLog:
* ifcvt.cc (noce_emit_czero): helper for noce_try_cond_zero_arith
(noce_try_cond_zero_arith): handler for condtional zero op
(noce_process_if_block): add noce_try_cond_zero_arith with hook control
gcc/testsuite/ChangeLog:
* gcc.target/riscv/zicond_ifcvt_opt.c: New test.
So the idea here is to improve upon the current code we generate for
conditional arithmetic. Right now we support conditional arithmetic
using zicond, but the sequence is poor.
Basically the if-converter knows how to generate a conditional add, but
it does so in a way that isn't as efficient as it could be.
In effect ifcvt wants to generate
t = a + b
res = cond ? t : b
We want to change it to
t = cond ? b : 0;
res = a + t;
The latter sequence expands to more efficient code trivially for risc-v.
I wandered a bit through the combine dumps to see if it would be easy to
capture this class of cases. We never get anything useful, and while I
can imagine "bridge" patterns that would potentially expose enough RTL
to allow us to rewrite without changing ifcvt, it'd just be a hack IMHO.
So going back to ifcvt...
In the first sequence the addition must wait for both "a" and "b" to be
available and the conditional move can fire on the next cycle.
In the second sequence the conditional move can fire when just "b" is
available. So that gives "a" another cycle to become ready (say if it's
coming from memory or a multi-cycle operation like multiply).
On the other hand the second sequence does keep "a" live longer.
In the end I strongly suspect neither sequence is significantly better
than the other. Meaning I don't think we need to conditionalize using
condzero arith at all.
I'll note that subsequent patches add MINUS, IOR, XOR and AND. It's
also possible (and important) to handle shifts. There's a conditional
shift-by-6 in leela's hot path.
Overall this looks a lot like the VRULL code, but just less complete.
My inclination is to do a cleanup pass on the VRULL code verify it
handles all the cases in your tests and commit the VRULL implementation
with your tests.
I'll do some further poking at this today. Thanks for re-submitting
these bits. Getting this target independent work cleaned up has been on
my TODO for a while now.
jeff