C1) & C2) * (1 << C1)

Andrew Pinski Wed, 12 Mar 2025 13:41:06 -0700

On Wed, Mar 12, 2025 at 1:38 PM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> Thanks for the review!
>
> Andrew Pinski <pins...@gmail.com> writes:
> > On Wed, Mar 12, 2025 at 12:00 PM Richard Sandiford
> > <richard.sandif...@arm.com> wrote:
> >>
> >> Using a combination of rules, we were able to fold
> >>
> >>   ((X >> C1) & C2) * (1 << C1)  -->  X & (C2 << C1)
> >>
> >> if everything was done at the same precision, but we couldn't fold
> >> it if the AND was done at a different precision.  The optimisation is
> >> often (but not always) valid for that case too.
> >>
> >> This patch adds a dedicated rule for the case where different precisions
> >> are involved.
> >>
> >> An alternative would be to extend the individual folds that together
> >> handle the same-precision case so that those rules handle differing
> >> precisions.  But the risk is that that could replace narrow operations
> >> with wide operations, which would be especially harmful on targets
> >> like avr.  It's also not obviously free of cycles.
> >>
> >> I also wondered whether the converts should be non-optional.
> >>
> >> gcc/
> >>         * match.pd: Fold ((X >> C1) & C2) * (1 << C1) to X & (C2 << C1).
> >>
> >> gcc/testsuite/
> >>         * gcc.dg/fold-mul-and-lshift-1.c: New test.
> >>         * gcc.dg/fold-mul-and-lshift-2.c: Likewise.
> >> ---
> >>  gcc/match.pd                                 | 29 ++++++++++
> >>  gcc/testsuite/gcc.dg/fold-mul-and-lshift-1.c | 59 ++++++++++++++++++++
> >>  gcc/testsuite/gcc.dg/fold-mul-and-lshift-2.c | 15 +++++
> >>  3 files changed, 103 insertions(+)
> >>  create mode 100644 gcc/testsuite/gcc.dg/fold-mul-and-lshift-1.c
> >>  create mode 100644 gcc/testsuite/gcc.dg/fold-mul-and-lshift-2.c
> >>
> >> diff --git a/gcc/match.pd b/gcc/match.pd
> >> index 5c679848bdf..3197d1cac75 100644
> >> --- a/gcc/match.pd
> >> +++ b/gcc/match.pd
> >> @@ -5231,6 +5231,35 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >>       (if (mask)
> >>        (bit_op (shift (convert @0) @1) { mask; })))))))
> >>
> >> +/* Fold ((X >> C1) & C2) * (1 << C1) into X & (C2 << C1), including cases 
> >> where
> >> +   the & happens in a different type.  It is the conversion case that 
> >> isn't
> >> +   a composition of other folds.
> >> +
> >> +   Let the type of the * and >> be T1 and the type of the & be T2.
> >> +   The fold is valid if the conversion to T2 preserves all information;
> >> +   that is, if T2 is wider than T1 or drops no more than C1 bits from T1.
> >> +   In that case, the & might operate on bits that are dropped by the
> >> +   later conversion to T1 and the multiplication by (1 << C1), but those
> >> +   bits are also dropped by ANDing with C2 << C1 (converted to T1).
> >> +
> >> +   If the conversion to T2 is not information-preserving, we have to be
> >> +   careful about the later conversion to T1 acting as a sign extension.
> >> +   We need either T2 to be unsigned or the top (sign) bit of C2 to be 
> >> clear.
> >> +   That is equivalent to testing whether C2 is nonnegative.  */
> >> +(simplify
> >> + (mult
> >> +  (convert? (bit_and (convert? (rshift @0 INTEGER_CST@1)) INTEGER_CST@2))
> >> +  INTEGER_CST@3)
> >> + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
> >> +  (with { auto prec = element_precision (type); }
> > Since we know this needs to be a scalar, Using TREE_PRECISION here is fine 
> > too.
>
> Yeah, agreed.  I'd wondered whether to use TREE_PRECISION instead,
> but then I'd also wondered about trying to make the fold work for ectors.
> Guess I ended up between two stools.
>
> >> +   (if (wi::ltu_p (wi::to_widest (@1), prec))
> >
> > I think using wi::to_wide is better than using wi::to_widest here.
>
> What's the reason for preferring wi::to_wide?  wi::to_widest should
> usually be more efficient for this kind of check, since the tree
> representation allows the underlying HWIs to be used directly.
> wi::to_wide instead requires masking off bits above the precision.
>
> E.g. on an --enable-checking=release compiler:
>
> bool
> foo (tree t, unsigned int n)
> {
>   return wi::ltu_p (wi::to_widest (t), n);
> }
>
> gives:
>
>     188c:       79400c02        ldrh    w2, [x0, #6]
>     1890:       7100045f        cmp     w2, #0x1
>     1894:       54000060        b.eq    18a0 <foo(tree_node*, unsigned 
> int)+0x14>  // b.none
>     1898:       52800000        mov     w0, #0x0                        // #0
>     189c:       d65f03c0        ret
>     18a0:       f9400800        ldr     x0, [x0, #16]
>     18a4:       eb21401f        cmp     x0, w1, uxtw
>     18a8:       1a9f27e0        cset    w0, cc  // cc = lo, ul, last
>     18ac:       d65f03c0        ret
>
> whereas:
>
> bool
> foo (tree t, unsigned int n)
> {
>   return wi::ltu_p (wi::to_wide (t), n);
> }
>
> gives:
>
>     188c:       79400802        ldrh    w2, [x0, #4]
>     1890:       7100045f        cmp     w2, #0x1
>     1894:       54000060        b.eq    18a0 <foo(tree_node*, unsigned 
> int)+0x14>  // b.none
>     1898:       52800000        mov     w0, #0x0                        // #0
>     189c:       d65f03c0        ret
>     18a0:       a9408800        ldp     x0, x2, [x0, #8]
>     18a4:       79406c03        ldrh    w3, [x0, #54]
>     18a8:       92800000        mov     x0, #0xffffffffffffffff         // #-1
>     18ac:       7100fc7f        cmp     w3, #0x3f
>     18b0:       9ac32000        lsl     x0, x0, x3
>     18b4:       8a200040        bic     x0, x2, x0
>     18b8:       9a829002        csel    x2, x0, x2, ls  // ls = plast
>     18bc:       eb21405f        cmp     x2, w1, uxtw
>     18c0:       1a9f27e0        cset    w0, cc  // cc = lo, ul, last
>     18c4:       d65f03c0        ret


Oh I guess I didn't realize that. Maybe there are other places which
should get this handling too (obviously for another time).

Thanks,
Andrew

>
> > The other place in match which checks shift count does:
> >        /* Use a signed compare to leave negative shift counts alone.  */
> >        && wi::ges_p (wi::to_wide (uniform_integer_cst_p (@1)),
> >                      element_precision (type)))
> >
> >
> >> +    (with { auto shift = tree_to_uhwi (@1); }
> >> +     (if ((prec <= element_precision (TREE_TYPE (@2)) + shift
> >> +          || wi::to_widest (@2) >= 0)
> >
> > I think `wi::to_widest (@2) >= 0` can be written as
> > `!tree_int_cst_sign_bit (@2)`.
>
> Ah, yeah, thanks.
>
> Richard

Re: [PATCH 1/2] match.pd: Fold ((X >> C1) & C2) * (1 << C1)

Reply via email to