Re: [PATCH v2] xtensa: Optimize boolean evaluation or branching when EQ/NE to zero in S[IF]mode

Max Filippov via Gcc-patches Mon, 05 Jun 2023 17:28:27 -0700

On Mon, Jun 5, 2023 at 8:15 AM Max Filippov <jcmvb...@gmail.com> wrote:
>
> Hi Suwa-san,
>
> On Mon, Jun 5, 2023 at 2:37 AM Takayuki 'January June' Suwa
> <jjsuwa_sys3...@yahoo.co.jp> wrote:
> >
> > This patch optimizes the boolean evaluation of EQ/NE against zero
> > by adding two insn_and_split patterns similar to SImode conditional
> > store:
> >
> > "eq_zero":
> >         op0 = (op1 == 0) ? 1 : 0;
> >         op0 = clz(op1) >> 5;  /* optimized (requires TARGET_NSA) */
> >
> > "movsicc_ne0_reg_0":
> >         op0 = (op1 != 0) ? op2 : 0;
> >         op0 = op2; if (op1 == 0) ? op0 = op1;  /* optimized */
> >
> >     /* example #1 */
> >     int bool_eqSI(int x) {
> >       return x == 0;
> >     }
> >     int bool_neSI(int x) {
> >       return x != 0;
> >     }
> >
> >     ;; after (TARGET_NSA)
> >     bool_eqSI:
> >         nsau    a2, a2
> >         srli    a2, a2, 5
> >         ret.n
> >     bool_neSI:
> >         mov.n   a9, a2
> >         movi.n  a2, 1
> >         moveqz  a2, a9, a9
> >         ret.n
> >
> > These also work in SFmode by ignoring their sign bits, and further-
> > more, the branch if EQ/NE against zero in SFmode is also done in the
> > same manner.
> >
> > The reasons for this optimization in SFmode are:
> >
> >   - Only zero values (negative or non-negative) contain no bits of 1
> >     with both the exponent and the mantissa.
> >   - EQ/NE comparisons involving NaNs produce no signal even if they
> >     are signaling.
> >   - Even if the use of IEEE 754 single-precision floating-point co-
> >     processor is configured (TARGET_HARD_FLOAT is true):
> >         1. Load zero value to FP register
> >         2. Possibly, additional FP move if the comparison target is
> >            an address register
> >         3. FP equality check instruction
> >         4. Read the boolean register containing the result, or condi-
> >            tional branch
> >     As noted above, a considerable number of instructions are still
> >     generated.
> >
> >     /* example #2 */
> >     int bool_eqSF(float x) {
> >       return x == 0;
> >     }
> >     int bool_neSF(float x) {
> >       return x != 0;
> >     }
> >     int bool_ltSF(float x) {
> >       return x < 0;
> >     }
> >     extern void foo(void);
> >     void cb_eqSF(float x) {
> >       if(x != 0)
> >         foo();
> >     }
> >     void cb_neSF(float x) {
> >       if(x == 0)
> >         foo();
> >     }
> >     void cb_geSF(float x) {
> >       if(x < 0)
> >         foo();
> >     }
> >
> >     ;; after
> >     ;; (TARGET_NSA, TARGET_BOOLEANS and TARGET_HARD_FLOAT)
> >     bool_eqSF:
> >         add.n   a2, a2, a2
> >         nsau    a2, a2
> >         srli    a2, a2, 5
> >         ret.n
> >     bool_neSF:
> >         add.n   a9, a2, a2
> >         movi.n  a2, 1
> >         moveqz  a2, a9, a9
> >         ret.n
> >     bool_ltSF:
> >         movi.n  a9, 0
> >         wfr     f0, a2
> >         wfr     f1, a9
> >         olt.s   b0, f0, f1
> >         movi.n  a9, 0
> >         movi.n  a2, 1
> >         movf    a2, a9, b0
> >         ret.n
> >     cb_eqSF:
> >         add.n   a2, a2, a2
> >         beqz.n  a2, .L6
> >         j.l     foo, a9
> >     .L6:
> >         ret.n
> >     cb_neSF:
> >         add.n   a2, a2, a2
> >         bnez.n  a2, .L8
> >         j.l     foo, a9
> >     .L8:
> >         ret.n
> >     cb_geSF:
> >         addi    sp, sp, -16
> >         movi.n  a3, 0
> >         s32i.n  a12, sp, 8
> >         s32i.n  a0, sp, 12
> >         mov.n   a12, a2
> >         call0   __unordsf2
> >         bnez.n  a2, .L10
> >         movi.n  a3, 0
> >         mov.n   a2, a12
> >         call0   __gesf2
> >         bnei    a2, -1, .L10
> >         l32i.n  a0, sp, 12
> >         l32i.n  a12, sp, 8
> >         addi    sp, sp, 16
> >         j.l     foo, a9
> >     .L10:
> >         l32i.n  a0, sp, 12
> >         l32i.n  a12, sp, 8
> >         addi    sp, sp, 16
> >         ret.n
> >
> > gcc/ChangeLog:
> >
> >         * config/xtensa/predicates.md (const_float_0_operand):
> >         Rename from obsolete "const_float_1_operand" and change the
> >         constant to compare.
> >         (cstoresf_cbranchsf_operand, cstoresf_cbranchsf_operator):
> >         New.
> >         * config/xtensa/xtensa.cc (xtensa_expand_conditional_branch):
> >         Add code for EQ/NE comparison with constant zero in SFmode.
> >         (xtensa_expand_scc): Added code to derive boolean evaluation
> >         of EQ/NE with constant zero for comparison in SFmode.
> >         (xtensa_rtx_costs): Change cost of CONST_DOUBLE with value
> >         zero inside "cbranchsf4" to 0.
> >         * config/xtensa/xtensa.md (cbranchsf4, cstoresf4):
> >         Change "match_operator" and the third "match_operand" to the
> >         ones mentioned above.
> >         (movsicc_ne0_reg_zero, eq_zero): New.
> > ---
> >  gcc/config/xtensa/predicates.md | 17 +++++++++--
> >  gcc/config/xtensa/xtensa.cc     | 45 ++++++++++++++++++++++++++++
> >  gcc/config/xtensa/xtensa.md     | 53 +++++++++++++++++++++++++++++----
> >  3 files changed, 106 insertions(+), 9 deletions(-)
>
> This version performs much better than v1, but there's still new
> testsuite failure in the gcc.c-torture/execute/bitfld-3.c


And on the config with FPU there's one more new failure
in the g++.dg/opt/pr58864.C with the following ICE:

gcc/testsuite/g++.dg/opt/pr58864.C:21:1: error: unrecognizable insn:
(insn 13 12 14 2 (set (reg:CC 18 b0)
       (eq:CC (reg/v:SF 43 [ c ])
           (const_double:SF 0.0 [0x0.0p+0]))) -1
    (nil))
during RTL pass: vregs

-- 
Thanks.
-- Max

Re: [PATCH v2] xtensa: Optimize boolean evaluation or branching when EQ/NE to zero in S[IF]mode

Reply via email to