"Roger Sayle" <ro...@nextmovesoftware.com> writes:
> Many thanks to Segher and Richard for pointing out that my removal
> of optimizations of ABS(ABS(x)) and ABS(FFS(x)) in the original version
> of this patch was incorrect, and my assumption that these would be 
> subsumed by val_signbit_known_clear_p was mistaken.  That the
> tests for ABS and FFS looked out of place, was not an indication that
> they were not required, but that we were missing simplifications for
> the related SS_ABS, PARITY, POPCOUNT, CLRSB, CLZ and CTZ etc.
> To make up for this mistake, in this revised patch I've not only restored
> the tests for ABS and FFS, but also added the many sibling RTX codes
> that I'd also expect to see optimized here, such as ABS(PARITY(x)).
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-08-02  Roger Sayle  <ro...@nextmovesoftware.com>
>             Segher Boessenkool  <seg...@kernel.crashing.org>
>             Richard Sandiford  <richard.sandif...@arm.com>

Thanks, but I didn't actually write anything. :-)

> gcc/ChangeLog
>         * simplify_rtx.cc (simplify_unary_operation_1) <ABS>: Add
>         optimizations for CLRSB, PARITY, POPCOUNT, SS_ABS, CLZ, CTZ
>         and LSHIFTRT that are all positive to complement the existing
>         FFS and (idempotent) ABS simplifications.
>         <SIGN_EXTEND>: Canonicalize SIGN_EXTEND to ZERO_EXTEND when
>         val_signbit_known_clear_p is true of the operand.
>         Simplify sign extensions of SUBREG truncations of operands
>         that are already suitably (zero) extended.
>         <ZERO_EXTEND>: Simplify zero extensions of SUBREG truncations
>         of operands that are already suitably zero extended.
>
> Thanks in advance,
> Roger
> --
>
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandif...@arm.com>
>> Sent: 02 August 2022 10:39
>> To: Roger Sayle <ro...@nextmovesoftware.com>
>> Cc: gcc-patches@gcc.gnu.org; 'Segher Boessenkool'
>> <seg...@kernel.crashing.org>
>> Subject: Re: [PATCH] Some additional zero-extension related optimizations
> in
>> simplify-rtx.
>> 
>> "Roger Sayle" <ro...@nextmovesoftware.com> writes:
>> > This patch implements some additional zero-extension and
>> > sign-extension related optimizations in simplify-rtx.cc.  The original
>> > motivation comes from PR rtl-optimization/71775, where in comment #2
>> Andrew Pinski sees:
>> >
>> > Failed to match this instruction:
>> > (set (reg:DI 88 [ _1 ])
>> >     (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))
>> >
>> > On many platforms the result of DImode CTZ is constrained to be a
>> > small unsigned integer (between 0 and 64), hence the truncation to
>> > 32-bits (using a SUBREG) and the following sign extension back to
>> > 64-bits are effectively a no-op, so the above should ideally (often)
>> > be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))".
>> >
>> > To implement this, and some closely related transformations, we build
>> > upon the existing val_signbit_known_clear_p predicate.  In the first
>> > chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit
>> > bit set, so the simplification of of ABS (ABS (x)) and ABS (FFS (x))
>> > can itself be simplified.
>> 
>> I think I misunderstood, but just in case: RTL ABS is well-defined for the
> minimum
>> integer (giving back the minimum integer), so we can't assume that ABS
> leaves
>> the sign bit clear.
>> 
>> Thanks,
>> Richard
>> 
>> > The second transformation is that we can canonicalized SIGN_EXTEND to
>> > ZERO_EXTEND (as in the PR 71775 case above) when the operand's
>> > sign-bit is known to be clear.  The final two chunks are for
>> > SIGN_EXTEND of a truncating SUBREG, and ZERO_EXTEND of a truncating
>> > SUBREG respectively.  The nonzero_bits of a truncating SUBREG
>> > pessimistically thinks that the upper bits may have an arbitrary value
>> > (by taking the SUBREG), so we need look deeper at the SUBREG's operand
>> > to confirm that the high bits are known to be zero.
>> >
>> > Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with
>> > default architecture options is undefined at zero, so we can't be sure
>> > the upper bits of reg:DI 88 will be sign extended (all zeros or all
> ones).
>> > nonzero_bits knows this, so the above transformations don't trigger,
>> > but the transformations themselves are perfectly valid for other
>> > operations such as FFS, POPCOUNT and PARITY, and on other
>> > targets/-march settings where CTZ is defined at zero.
>> >
>> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>> > and make -k check, both with and without --target_board=unix{-m32},
>> > with no new failures.  Testing with CSiBE shows these transformations
>> > trigger on several source files (and with -Os reduces the size of the
>> > code).  Ok for mainline?
>> >
>> >
>> > 2022-07-27  Roger Sayle  <ro...@nextmovesoftware.com>
>> >
>> > gcc/ChangeLog
>> >         * simplify_rtx.cc (simplify_unary_operation_1) <ABS>: Simplify
>> >         test as both FFS and ABS result in nonzero_bits returning a
>> >         mask that satisfies val_signbit_known_clear_p.
>> >         <SIGN_EXTEND>: Canonicalize SIGN_EXTEND to ZERO_EXTEND when
>> >         val_signbit_known_clear_p is true of the operand.
>> >         Simplify sign extensions of SUBREG truncations of operands
>> >         that are already suitably (zero) extended.
>> >         <ZERO_EXTEND>: Simplify zero extensions of SUBREG truncations
>> >         of operands that are already suitably zero extended.
>> >
>> >
>> > Thanks in advance,
>> > Roger
>> > --
>> >
>> > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index
>> > fa20665..e62bf56 100644
>> > --- a/gcc/simplify-rtx.cc
>> > +++ b/gcc/simplify-rtx.cc
>> > @@ -1366,9 +1366,8 @@ simplify_context::simplify_unary_operation_1
>> (rtx_code code, machine_mode mode,
>> >    break;
>> >
>> >        /* If operand is something known to be positive, ignore the ABS.
> */
>> > -      if (GET_CODE (op) == FFS || GET_CODE (op) == ABS
>> > -    || val_signbit_known_clear_p (GET_MODE (op),
>> > -                                  nonzero_bits (op, GET_MODE (op))))
>> > +      if (val_signbit_known_clear_p (GET_MODE (op),
>> > +                               nonzero_bits (op, GET_MODE (op))))
>> >    return op;
>> >
>> >        /* If operand is known to be only -1 or 0, convert ABS to NEG.
>> > */ @@ -1615,6 +1614,24 @@ simplify_context::simplify_unary_operation_1
>> (rtx_code code, machine_mode mode,
>> >        }
>> >    }
>> >
>> > +      /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when
>> > +         we know the sign bit of OP must be clear.  */
>> > +      if (val_signbit_known_clear_p (GET_MODE (op),
>> > +                               nonzero_bits (op, GET_MODE (op))))
>> > +  return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op));
>> > +
>> > +      /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
>> > +      if (GET_CODE (op) == SUBREG
>> > +    && subreg_lowpart_p (op)
>> > +    && GET_MODE (SUBREG_REG (op)) == mode
>> > +    && is_a <scalar_int_mode> (mode, &int_mode)
>> > +    && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
>> > +    && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
>> > +    && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION
>> (int_mode)
>> > +    && (nonzero_bits (SUBREG_REG (op), mode)
>> > +        & ~(GET_MODE_MASK (op_mode)>>1)) == 0)
>> > +  return SUBREG_REG (op);
>> > +
>> >  #if defined(POINTERS_EXTEND_UNSIGNED)
>> >        /* As we do not know which address space the pointer is referring
> to,
>> >     we can do this only if the target does not support different
>> > pointer @@ -1765,6 +1782,18 @@
>> simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode
>> mode,
>> >                                 op0_mode);
>> >    }
>> >
>> > +      /* (zero_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
>> > +      if (GET_CODE (op) == SUBREG
>> > +    && subreg_lowpart_p (op)
>> > +    && GET_MODE (SUBREG_REG (op)) == mode
>> > +    && is_a <scalar_int_mode> (mode, &int_mode)
>> > +    && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
>> > +    && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
>> > +    && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION
>> (int_mode)
>> > +    && (nonzero_bits (SUBREG_REG (op), mode)
>> > +        & ~GET_MODE_MASK (op_mode)) == 0)
>> > +  return SUBREG_REG (op);
>> > +
>> >  #if defined(POINTERS_EXTEND_UNSIGNED)
>> >        /* As we do not know which address space the pointer is referring
> to,
>> >     we can do this only if the target does not support different
>> > pointer
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index fa20665..b53272b 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -1366,11 +1366,57 @@ simplify_context::simplify_unary_operation_1 
> (rtx_code code, machine_mode mode,
>       break;
>  
>        /* If operand is something known to be positive, ignore the ABS.  */
> -      if (GET_CODE (op) == FFS || GET_CODE (op) == ABS
> -       || val_signbit_known_clear_p (GET_MODE (op),
> -                                     nonzero_bits (op, GET_MODE (op))))
> +      if (val_signbit_known_clear_p (GET_MODE (op),
> +                                  nonzero_bits (op, GET_MODE (op))))
>       return op;
>  
> +      /* Using nonzero_bits doesn't (currently) work for modes wider than
> +      HOST_WIDE_INT, so the following transformations help simplify
> +      ABS for TImode and wider.  */
> +      switch (GET_CODE (op))
> +     {
> +     case ABS:
> +     case CLRSB:
> +     case FFS:
> +     case PARITY:
> +     case POPCOUNT:
> +     case SS_ABS:
> +       return op;
> +
> +     case CLZ:
> +       if (is_a <scalar_int_mode> (mode, &int_mode)
> +           && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT)
> +         {
> +           HOST_WIDE_INT val0;
> +           if (CLZ_DEFINED_VALUE_AT_ZERO (int_mode, val0)
> +               && (val0 >> (GET_MODE_PRECISION (int_mode) - 1)) == 0)
> +             return op;

Shifts right of negative numbers are implementation-defined,
so I guess this should be unsigned HOST_WIDE_INT instead.
(Or use (val0 & (GET_MODE_MASK (int_mode) >> 1)) == 0,
like you do below.)

Same for CTZ.

OK with that change, thanks.

Richard

> +         }
> +       break;
> +
> +     case CTZ:
> +       if (is_a <scalar_int_mode> (mode, &int_mode)
> +           && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT)
> +         {
> +           HOST_WIDE_INT val0;
> +           if (CTZ_DEFINED_VALUE_AT_ZERO (int_mode, val0)
> +               && (val0 >> (GET_MODE_PRECISION (int_mode) - 1)) == 0)
> +             return op;
> +         }
> +       break;
> +
> +     case LSHIFTRT:
> +       if (CONST_INT_P (XEXP (op, 1))
> +           && INTVAL (XEXP (op, 1)) > 0
> +           && is_a <scalar_int_mode> (mode, &int_mode)
> +           && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (int_mode))
> +         return op;
> +       break;
> +
> +     default:
> +       break;
> +     }
> +
>        /* If operand is known to be only -1 or 0, convert ABS to NEG.  */
>        if (is_a <scalar_int_mode> (mode, &int_mode)
>         && (num_sign_bit_copies (op, int_mode)
> @@ -1615,6 +1661,24 @@ simplify_context::simplify_unary_operation_1 (rtx_code 
> code, machine_mode mode,
>           }
>       }
>  
> +      /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when
> +         we know the sign bit of OP must be clear.  */
> +      if (val_signbit_known_clear_p (GET_MODE (op),
> +                                  nonzero_bits (op, GET_MODE (op))))
> +     return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op));
> +
> +      /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
> +      if (GET_CODE (op) == SUBREG
> +       && subreg_lowpart_p (op)
> +       && GET_MODE (SUBREG_REG (op)) == mode
> +       && is_a <scalar_int_mode> (mode, &int_mode)
> +       && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
> +       && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
> +       && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode)
> +       && (nonzero_bits (SUBREG_REG (op), mode)
> +           & ~(GET_MODE_MASK (op_mode) >> 1)) == 0)
> +     return SUBREG_REG (op);
> +
>  #if defined(POINTERS_EXTEND_UNSIGNED)
>        /* As we do not know which address space the pointer is referring to,
>        we can do this only if the target does not support different pointer
> @@ -1765,6 +1829,18 @@ simplify_context::simplify_unary_operation_1 (rtx_code 
> code, machine_mode mode,
>                                    op0_mode);
>       }
>  
> +      /* (zero_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
> +      if (GET_CODE (op) == SUBREG
> +       && subreg_lowpart_p (op)
> +       && GET_MODE (SUBREG_REG (op)) == mode
> +       && is_a <scalar_int_mode> (mode, &int_mode)
> +       && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
> +       && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
> +       && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode)
> +       && (nonzero_bits (SUBREG_REG (op), mode)
> +           & ~GET_MODE_MASK (op_mode)) == 0)
> +     return SUBREG_REG (op);
> +
>  #if defined(POINTERS_EXTEND_UNSIGNED)
>        /* As we do not know which address space the pointer is referring to,
>        we can do this only if the target does not support different pointer

Reply via email to