Hi Suwa-san,
On Mon, Jun 5, 2023 at 2:37 AM Takayuki 'January June' Suwa
<[email protected]> wrote:
>
> This patch optimizes the boolean evaluation of EQ/NE against zero
> by adding two insn_and_split patterns similar to SImode conditional
> store:
>
> "eq_zero":
> op0 = (op1 == 0) ? 1 : 0;
> op0 = clz(op1) >> 5; /* optimized (requires TARGET_NSA) */
>
> "movsicc_ne0_reg_0":
> op0 = (op1 != 0) ? op2 : 0;
> op0 = op2; if (op1 == 0) ? op0 = op1; /* optimized */
>
> /* example #1 */
> int bool_eqSI(int x) {
> return x == 0;
> }
> int bool_neSI(int x) {
> return x != 0;
> }
>
> ;; after (TARGET_NSA)
> bool_eqSI:
> nsau a2, a2
> srli a2, a2, 5
> ret.n
> bool_neSI:
> mov.n a9, a2
> movi.n a2, 1
> moveqz a2, a9, a9
> ret.n
>
> These also work in SFmode by ignoring their sign bits, and further-
> more, the branch if EQ/NE against zero in SFmode is also done in the
> same manner.
>
> The reasons for this optimization in SFmode are:
>
> - Only zero values (negative or non-negative) contain no bits of 1
> with both the exponent and the mantissa.
> - EQ/NE comparisons involving NaNs produce no signal even if they
> are signaling.
> - Even if the use of IEEE 754 single-precision floating-point co-
> processor is configured (TARGET_HARD_FLOAT is true):
> 1. Load zero value to FP register
> 2. Possibly, additional FP move if the comparison target is
> an address register
> 3. FP equality check instruction
> 4. Read the boolean register containing the result, or condi-
> tional branch
> As noted above, a considerable number of instructions are still
> generated.
>
> /* example #2 */
> int bool_eqSF(float x) {
> return x == 0;
> }
> int bool_neSF(float x) {
> return x != 0;
> }
> int bool_ltSF(float x) {
> return x < 0;
> }
> extern void foo(void);
> void cb_eqSF(float x) {
> if(x != 0)
> foo();
> }
> void cb_neSF(float x) {
> if(x == 0)
> foo();
> }
> void cb_geSF(float x) {
> if(x < 0)
> foo();
> }
>
> ;; after
> ;; (TARGET_NSA, TARGET_BOOLEANS and TARGET_HARD_FLOAT)
> bool_eqSF:
> add.n a2, a2, a2
> nsau a2, a2
> srli a2, a2, 5
> ret.n
> bool_neSF:
> add.n a9, a2, a2
> movi.n a2, 1
> moveqz a2, a9, a9
> ret.n
> bool_ltSF:
> movi.n a9, 0
> wfr f0, a2
> wfr f1, a9
> olt.s b0, f0, f1
> movi.n a9, 0
> movi.n a2, 1
> movf a2, a9, b0
> ret.n
> cb_eqSF:
> add.n a2, a2, a2
> beqz.n a2, .L6
> j.l foo, a9
> .L6:
> ret.n
> cb_neSF:
> add.n a2, a2, a2
> bnez.n a2, .L8
> j.l foo, a9
> .L8:
> ret.n
> cb_geSF:
> addi sp, sp, -16
> movi.n a3, 0
> s32i.n a12, sp, 8
> s32i.n a0, sp, 12
> mov.n a12, a2
> call0 __unordsf2
> bnez.n a2, .L10
> movi.n a3, 0
> mov.n a2, a12
> call0 __gesf2
> bnei a2, -1, .L10
> l32i.n a0, sp, 12
> l32i.n a12, sp, 8
> addi sp, sp, 16
> j.l foo, a9
> .L10:
> l32i.n a0, sp, 12
> l32i.n a12, sp, 8
> addi sp, sp, 16
> ret.n
>
> gcc/ChangeLog:
>
> * config/xtensa/predicates.md (const_float_0_operand):
> Rename from obsolete "const_float_1_operand" and change the
> constant to compare.
> (cstoresf_cbranchsf_operand, cstoresf_cbranchsf_operator):
> New.
> * config/xtensa/xtensa.cc (xtensa_expand_conditional_branch):
> Add code for EQ/NE comparison with constant zero in SFmode.
> (xtensa_expand_scc): Added code to derive boolean evaluation
> of EQ/NE with constant zero for comparison in SFmode.
> (xtensa_rtx_costs): Change cost of CONST_DOUBLE with value
> zero inside "cbranchsf4" to 0.
> * config/xtensa/xtensa.md (cbranchsf4, cstoresf4):
> Change "match_operator" and the third "match_operand" to the
> ones mentioned above.
> (movsicc_ne0_reg_zero, eq_zero): New.
> ---
> gcc/config/xtensa/predicates.md | 17 +++++++++--
> gcc/config/xtensa/xtensa.cc | 45 ++++++++++++++++++++++++++++
> gcc/config/xtensa/xtensa.md | 53 +++++++++++++++++++++++++++++----
> 3 files changed, 106 insertions(+), 9 deletions(-)
This version performs much better than v1, but there's still new
testsuite failure in the gcc.c-torture/execute/bitfld-3.c
and the following change in the generated code
from:
l32i.n a11, a7, 8
l8ui a9, a7, 12
movi a10, 0xff
add.n a9, a9, a10
addi.n a7, a11, -1
movi.n a10, 1
movi.n a6, 0
moveqz a10, a6, a11
to:
l32i.n a10, a7, 8
l8ui a9, a7, 12
movi a11, 0xff
add.n a9, a9, a11
addi.n a7, a10, -1
movi.n a11, 1
mov.n a10, a11
movnez a10, a11, a11
suggests that the pattern movsicc_ne0_reg_zero does not work correctly
when its operands overlap.
--
Thanks.
-- Max