Re: [PATCH v2] xtensa: Optimize boolean evaluation or branching when EQ/NE to zero in S[IF]mode

Max Filippov via Gcc-patches Mon, 05 Jun 2023 08:15:47 -0700

Hi Suwa-san,

On Mon, Jun 5, 2023 at 2:37 AM Takayuki 'January June' Suwa
<jjsuwa_sys3...@yahoo.co.jp> wrote:
>
> This patch optimizes the boolean evaluation of EQ/NE against zero
> by adding two insn_and_split patterns similar to SImode conditional
> store:
>
> "eq_zero":
>         op0 = (op1 == 0) ? 1 : 0;
>         op0 = clz(op1) >> 5;  /* optimized (requires TARGET_NSA) */
>
> "movsicc_ne0_reg_0":
>         op0 = (op1 != 0) ? op2 : 0;
>         op0 = op2; if (op1 == 0) ? op0 = op1;  /* optimized */
>
>     /* example #1 */
>     int bool_eqSI(int x) {
>       return x == 0;
>     }
>     int bool_neSI(int x) {
>       return x != 0;
>     }
>
>     ;; after (TARGET_NSA)
>     bool_eqSI:
>         nsau    a2, a2
>         srli    a2, a2, 5
>         ret.n
>     bool_neSI:
>         mov.n   a9, a2
>         movi.n  a2, 1
>         moveqz  a2, a9, a9
>         ret.n
>
> These also work in SFmode by ignoring their sign bits, and further-
> more, the branch if EQ/NE against zero in SFmode is also done in the
> same manner.
>
> The reasons for this optimization in SFmode are:
>
>   - Only zero values (negative or non-negative) contain no bits of 1
>     with both the exponent and the mantissa.
>   - EQ/NE comparisons involving NaNs produce no signal even if they
>     are signaling.
>   - Even if the use of IEEE 754 single-precision floating-point co-
>     processor is configured (TARGET_HARD_FLOAT is true):
>         1. Load zero value to FP register
>         2. Possibly, additional FP move if the comparison target is
>            an address register
>         3. FP equality check instruction
>         4. Read the boolean register containing the result, or condi-
>            tional branch
>     As noted above, a considerable number of instructions are still
>     generated.
>
>     /* example #2 */
>     int bool_eqSF(float x) {
>       return x == 0;
>     }
>     int bool_neSF(float x) {
>       return x != 0;
>     }
>     int bool_ltSF(float x) {
>       return x < 0;
>     }
>     extern void foo(void);
>     void cb_eqSF(float x) {
>       if(x != 0)
>         foo();
>     }
>     void cb_neSF(float x) {
>       if(x == 0)
>         foo();
>     }
>     void cb_geSF(float x) {
>       if(x < 0)
>         foo();
>     }
>
>     ;; after
>     ;; (TARGET_NSA, TARGET_BOOLEANS and TARGET_HARD_FLOAT)
>     bool_eqSF:
>         add.n   a2, a2, a2
>         nsau    a2, a2
>         srli    a2, a2, 5
>         ret.n
>     bool_neSF:
>         add.n   a9, a2, a2
>         movi.n  a2, 1
>         moveqz  a2, a9, a9
>         ret.n
>     bool_ltSF:
>         movi.n  a9, 0
>         wfr     f0, a2
>         wfr     f1, a9
>         olt.s   b0, f0, f1
>         movi.n  a9, 0
>         movi.n  a2, 1
>         movf    a2, a9, b0
>         ret.n
>     cb_eqSF:
>         add.n   a2, a2, a2
>         beqz.n  a2, .L6
>         j.l     foo, a9
>     .L6:
>         ret.n
>     cb_neSF:
>         add.n   a2, a2, a2
>         bnez.n  a2, .L8
>         j.l     foo, a9
>     .L8:
>         ret.n
>     cb_geSF:
>         addi    sp, sp, -16
>         movi.n  a3, 0
>         s32i.n  a12, sp, 8
>         s32i.n  a0, sp, 12
>         mov.n   a12, a2
>         call0   __unordsf2
>         bnez.n  a2, .L10
>         movi.n  a3, 0
>         mov.n   a2, a12
>         call0   __gesf2
>         bnei    a2, -1, .L10
>         l32i.n  a0, sp, 12
>         l32i.n  a12, sp, 8
>         addi    sp, sp, 16
>         j.l     foo, a9
>     .L10:
>         l32i.n  a0, sp, 12
>         l32i.n  a12, sp, 8
>         addi    sp, sp, 16
>         ret.n
>
> gcc/ChangeLog:
>
>         * config/xtensa/predicates.md (const_float_0_operand):
>         Rename from obsolete "const_float_1_operand" and change the
>         constant to compare.
>         (cstoresf_cbranchsf_operand, cstoresf_cbranchsf_operator):
>         New.
>         * config/xtensa/xtensa.cc (xtensa_expand_conditional_branch):
>         Add code for EQ/NE comparison with constant zero in SFmode.
>         (xtensa_expand_scc): Added code to derive boolean evaluation
>         of EQ/NE with constant zero for comparison in SFmode.
>         (xtensa_rtx_costs): Change cost of CONST_DOUBLE with value
>         zero inside "cbranchsf4" to 0.
>         * config/xtensa/xtensa.md (cbranchsf4, cstoresf4):
>         Change "match_operator" and the third "match_operand" to the
>         ones mentioned above.
>         (movsicc_ne0_reg_zero, eq_zero): New.
> ---
>  gcc/config/xtensa/predicates.md | 17 +++++++++--
>  gcc/config/xtensa/xtensa.cc     | 45 ++++++++++++++++++++++++++++
>  gcc/config/xtensa/xtensa.md     | 53 +++++++++++++++++++++++++++++----
>  3 files changed, 106 insertions(+), 9 deletions(-)


This version performs much better than v1, but there's still new
testsuite failure in the gcc.c-torture/execute/bitfld-3.c
and the following change in the generated code
from:

       l32i.n  a11, a7, 8
       l8ui    a9, a7, 12
       movi    a10, 0xff
       add.n   a9, a9, a10
       addi.n  a7, a11, -1
       movi.n  a10, 1
       movi.n  a6, 0
       moveqz  a10, a6, a11

to:

       l32i.n  a10, a7, 8
       l8ui    a9, a7, 12
       movi    a11, 0xff
       add.n   a9, a9, a11
       addi.n  a7, a10, -1
       movi.n  a11, 1
       mov.n   a10, a11
       movnez  a10, a11, a11

suggests that the pattern movsicc_ne0_reg_zero does not work correctly
when its operands overlap.

-- 
Thanks.
-- Max

Re: [PATCH v2] xtensa: Optimize boolean evaluation or branching when EQ/NE to zero in S[IF]mode

Reply via email to