On Wed, Mar 12, 2025 at 1:38 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Thanks for the review! > > Andrew Pinski <pins...@gmail.com> writes: > > On Wed, Mar 12, 2025 at 12:00 PM Richard Sandiford > > <richard.sandif...@arm.com> wrote: > >> > >> Using a combination of rules, we were able to fold > >> > >> ((X >> C1) & C2) * (1 << C1) --> X & (C2 << C1) > >> > >> if everything was done at the same precision, but we couldn't fold > >> it if the AND was done at a different precision. The optimisation is > >> often (but not always) valid for that case too. > >> > >> This patch adds a dedicated rule for the case where different precisions > >> are involved. > >> > >> An alternative would be to extend the individual folds that together > >> handle the same-precision case so that those rules handle differing > >> precisions. But the risk is that that could replace narrow operations > >> with wide operations, which would be especially harmful on targets > >> like avr. It's also not obviously free of cycles. > >> > >> I also wondered whether the converts should be non-optional. > >> > >> gcc/ > >> * match.pd: Fold ((X >> C1) & C2) * (1 << C1) to X & (C2 << C1). > >> > >> gcc/testsuite/ > >> * gcc.dg/fold-mul-and-lshift-1.c: New test. > >> * gcc.dg/fold-mul-and-lshift-2.c: Likewise. > >> --- > >> gcc/match.pd | 29 ++++++++++ > >> gcc/testsuite/gcc.dg/fold-mul-and-lshift-1.c | 59 ++++++++++++++++++++ > >> gcc/testsuite/gcc.dg/fold-mul-and-lshift-2.c | 15 +++++ > >> 3 files changed, 103 insertions(+) > >> create mode 100644 gcc/testsuite/gcc.dg/fold-mul-and-lshift-1.c > >> create mode 100644 gcc/testsuite/gcc.dg/fold-mul-and-lshift-2.c > >> > >> diff --git a/gcc/match.pd b/gcc/match.pd > >> index 5c679848bdf..3197d1cac75 100644 > >> --- a/gcc/match.pd > >> +++ b/gcc/match.pd > >> @@ -5231,6 +5231,35 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > >> (if (mask) > >> (bit_op (shift (convert @0) @1) { mask; }))))))) > >> > >> +/* Fold ((X >> C1) & C2) * (1 << C1) into X & (C2 << C1), including cases > >> where > >> + the & happens in a different type. It is the conversion case that > >> isn't > >> + a composition of other folds. > >> + > >> + Let the type of the * and >> be T1 and the type of the & be T2. > >> + The fold is valid if the conversion to T2 preserves all information; > >> + that is, if T2 is wider than T1 or drops no more than C1 bits from T1. > >> + In that case, the & might operate on bits that are dropped by the > >> + later conversion to T1 and the multiplication by (1 << C1), but those > >> + bits are also dropped by ANDing with C2 << C1 (converted to T1). > >> + > >> + If the conversion to T2 is not information-preserving, we have to be > >> + careful about the later conversion to T1 acting as a sign extension. > >> + We need either T2 to be unsigned or the top (sign) bit of C2 to be > >> clear. > >> + That is equivalent to testing whether C2 is nonnegative. */ > >> +(simplify > >> + (mult > >> + (convert? (bit_and (convert? (rshift @0 INTEGER_CST@1)) INTEGER_CST@2)) > >> + INTEGER_CST@3) > >> + (if (tree_nop_conversion_p (type, TREE_TYPE (@0))) > >> + (with { auto prec = element_precision (type); } > > Since we know this needs to be a scalar, Using TREE_PRECISION here is fine > > too. > > Yeah, agreed. I'd wondered whether to use TREE_PRECISION instead, > but then I'd also wondered about trying to make the fold work for ectors. > Guess I ended up between two stools. > > >> + (if (wi::ltu_p (wi::to_widest (@1), prec)) > > > > I think using wi::to_wide is better than using wi::to_widest here. > > What's the reason for preferring wi::to_wide? wi::to_widest should > usually be more efficient for this kind of check, since the tree > representation allows the underlying HWIs to be used directly. > wi::to_wide instead requires masking off bits above the precision. > > E.g. on an --enable-checking=release compiler: > > bool > foo (tree t, unsigned int n) > { > return wi::ltu_p (wi::to_widest (t), n); > } > > gives: > > 188c: 79400c02 ldrh w2, [x0, #6] > 1890: 7100045f cmp w2, #0x1 > 1894: 54000060 b.eq 18a0 <foo(tree_node*, unsigned > int)+0x14> // b.none > 1898: 52800000 mov w0, #0x0 // #0 > 189c: d65f03c0 ret > 18a0: f9400800 ldr x0, [x0, #16] > 18a4: eb21401f cmp x0, w1, uxtw > 18a8: 1a9f27e0 cset w0, cc // cc = lo, ul, last > 18ac: d65f03c0 ret > > whereas: > > bool > foo (tree t, unsigned int n) > { > return wi::ltu_p (wi::to_wide (t), n); > } > > gives: > > 188c: 79400802 ldrh w2, [x0, #4] > 1890: 7100045f cmp w2, #0x1 > 1894: 54000060 b.eq 18a0 <foo(tree_node*, unsigned > int)+0x14> // b.none > 1898: 52800000 mov w0, #0x0 // #0 > 189c: d65f03c0 ret > 18a0: a9408800 ldp x0, x2, [x0, #8] > 18a4: 79406c03 ldrh w3, [x0, #54] > 18a8: 92800000 mov x0, #0xffffffffffffffff // #-1 > 18ac: 7100fc7f cmp w3, #0x3f > 18b0: 9ac32000 lsl x0, x0, x3 > 18b4: 8a200040 bic x0, x2, x0 > 18b8: 9a829002 csel x2, x0, x2, ls // ls = plast > 18bc: eb21405f cmp x2, w1, uxtw > 18c0: 1a9f27e0 cset w0, cc // cc = lo, ul, last > 18c4: d65f03c0 ret
Oh I guess I didn't realize that. Maybe there are other places which should get this handling too (obviously for another time). Thanks, Andrew > > > The other place in match which checks shift count does: > > /* Use a signed compare to leave negative shift counts alone. */ > > && wi::ges_p (wi::to_wide (uniform_integer_cst_p (@1)), > > element_precision (type))) > > > > > >> + (with { auto shift = tree_to_uhwi (@1); } > >> + (if ((prec <= element_precision (TREE_TYPE (@2)) + shift > >> + || wi::to_widest (@2) >= 0) > > > > I think `wi::to_widest (@2) >= 0` can be written as > > `!tree_int_cst_sign_bit (@2)`. > > Ah, yeah, thanks. > > Richard