On Mon, Jun 5, 2023 at 8:15 AM Max Filippov <[email protected]> wrote:
>
> Hi Suwa-san,
>
> On Mon, Jun 5, 2023 at 2:37 AM Takayuki 'January June' Suwa
> <[email protected]> wrote:
> >
> > This patch optimizes the boolean evaluation of EQ/NE against zero
> > by adding two insn_and_split patterns similar to SImode conditional
> > store:
> >
> > "eq_zero":
> > op0 = (op1 == 0) ? 1 : 0;
> > op0 = clz(op1) >> 5; /* optimized (requires TARGET_NSA) */
> >
> > "movsicc_ne0_reg_0":
> > op0 = (op1 != 0) ? op2 : 0;
> > op0 = op2; if (op1 == 0) ? op0 = op1; /* optimized */
> >
> > /* example #1 */
> > int bool_eqSI(int x) {
> > return x == 0;
> > }
> > int bool_neSI(int x) {
> > return x != 0;
> > }
> >
> > ;; after (TARGET_NSA)
> > bool_eqSI:
> > nsau a2, a2
> > srli a2, a2, 5
> > ret.n
> > bool_neSI:
> > mov.n a9, a2
> > movi.n a2, 1
> > moveqz a2, a9, a9
> > ret.n
> >
> > These also work in SFmode by ignoring their sign bits, and further-
> > more, the branch if EQ/NE against zero in SFmode is also done in the
> > same manner.
> >
> > The reasons for this optimization in SFmode are:
> >
> > - Only zero values (negative or non-negative) contain no bits of 1
> > with both the exponent and the mantissa.
> > - EQ/NE comparisons involving NaNs produce no signal even if they
> > are signaling.
> > - Even if the use of IEEE 754 single-precision floating-point co-
> > processor is configured (TARGET_HARD_FLOAT is true):
> > 1. Load zero value to FP register
> > 2. Possibly, additional FP move if the comparison target is
> > an address register
> > 3. FP equality check instruction
> > 4. Read the boolean register containing the result, or condi-
> > tional branch
> > As noted above, a considerable number of instructions are still
> > generated.
> >
> > /* example #2 */
> > int bool_eqSF(float x) {
> > return x == 0;
> > }
> > int bool_neSF(float x) {
> > return x != 0;
> > }
> > int bool_ltSF(float x) {
> > return x < 0;
> > }
> > extern void foo(void);
> > void cb_eqSF(float x) {
> > if(x != 0)
> > foo();
> > }
> > void cb_neSF(float x) {
> > if(x == 0)
> > foo();
> > }
> > void cb_geSF(float x) {
> > if(x < 0)
> > foo();
> > }
> >
> > ;; after
> > ;; (TARGET_NSA, TARGET_BOOLEANS and TARGET_HARD_FLOAT)
> > bool_eqSF:
> > add.n a2, a2, a2
> > nsau a2, a2
> > srli a2, a2, 5
> > ret.n
> > bool_neSF:
> > add.n a9, a2, a2
> > movi.n a2, 1
> > moveqz a2, a9, a9
> > ret.n
> > bool_ltSF:
> > movi.n a9, 0
> > wfr f0, a2
> > wfr f1, a9
> > olt.s b0, f0, f1
> > movi.n a9, 0
> > movi.n a2, 1
> > movf a2, a9, b0
> > ret.n
> > cb_eqSF:
> > add.n a2, a2, a2
> > beqz.n a2, .L6
> > j.l foo, a9
> > .L6:
> > ret.n
> > cb_neSF:
> > add.n a2, a2, a2
> > bnez.n a2, .L8
> > j.l foo, a9
> > .L8:
> > ret.n
> > cb_geSF:
> > addi sp, sp, -16
> > movi.n a3, 0
> > s32i.n a12, sp, 8
> > s32i.n a0, sp, 12
> > mov.n a12, a2
> > call0 __unordsf2
> > bnez.n a2, .L10
> > movi.n a3, 0
> > mov.n a2, a12
> > call0 __gesf2
> > bnei a2, -1, .L10
> > l32i.n a0, sp, 12
> > l32i.n a12, sp, 8
> > addi sp, sp, 16
> > j.l foo, a9
> > .L10:
> > l32i.n a0, sp, 12
> > l32i.n a12, sp, 8
> > addi sp, sp, 16
> > ret.n
> >
> > gcc/ChangeLog:
> >
> > * config/xtensa/predicates.md (const_float_0_operand):
> > Rename from obsolete "const_float_1_operand" and change the
> > constant to compare.
> > (cstoresf_cbranchsf_operand, cstoresf_cbranchsf_operator):
> > New.
> > * config/xtensa/xtensa.cc (xtensa_expand_conditional_branch):
> > Add code for EQ/NE comparison with constant zero in SFmode.
> > (xtensa_expand_scc): Added code to derive boolean evaluation
> > of EQ/NE with constant zero for comparison in SFmode.
> > (xtensa_rtx_costs): Change cost of CONST_DOUBLE with value
> > zero inside "cbranchsf4" to 0.
> > * config/xtensa/xtensa.md (cbranchsf4, cstoresf4):
> > Change "match_operator" and the third "match_operand" to the
> > ones mentioned above.
> > (movsicc_ne0_reg_zero, eq_zero): New.
> > ---
> > gcc/config/xtensa/predicates.md | 17 +++++++++--
> > gcc/config/xtensa/xtensa.cc | 45 ++++++++++++++++++++++++++++
> > gcc/config/xtensa/xtensa.md | 53 +++++++++++++++++++++++++++++----
> > 3 files changed, 106 insertions(+), 9 deletions(-)
>
> This version performs much better than v1, but there's still new
> testsuite failure in the gcc.c-torture/execute/bitfld-3.c
And on the config with FPU there's one more new failure
in the g++.dg/opt/pr58864.C with the following ICE:
gcc/testsuite/g++.dg/opt/pr58864.C:21:1: error: unrecognizable insn:
(insn 13 12 14 2 (set (reg:CC 18 b0)
(eq:CC (reg/v:SF 43 [ c ])
(const_double:SF 0.0 [0x0.0p+0]))) -1
(nil))
during RTL pass: vregs
--
Thanks.
-- Max