Thanks for the review! Andrew Pinski <pins...@gmail.com> writes: > On Wed, Mar 12, 2025 at 12:00 PM Richard Sandiford > <richard.sandif...@arm.com> wrote: >> >> Using a combination of rules, we were able to fold >> >> ((X >> C1) & C2) * (1 << C1) --> X & (C2 << C1) >> >> if everything was done at the same precision, but we couldn't fold >> it if the AND was done at a different precision. The optimisation is >> often (but not always) valid for that case too. >> >> This patch adds a dedicated rule for the case where different precisions >> are involved. >> >> An alternative would be to extend the individual folds that together >> handle the same-precision case so that those rules handle differing >> precisions. But the risk is that that could replace narrow operations >> with wide operations, which would be especially harmful on targets >> like avr. It's also not obviously free of cycles. >> >> I also wondered whether the converts should be non-optional. >> >> gcc/ >> * match.pd: Fold ((X >> C1) & C2) * (1 << C1) to X & (C2 << C1). >> >> gcc/testsuite/ >> * gcc.dg/fold-mul-and-lshift-1.c: New test. >> * gcc.dg/fold-mul-and-lshift-2.c: Likewise. >> --- >> gcc/match.pd | 29 ++++++++++ >> gcc/testsuite/gcc.dg/fold-mul-and-lshift-1.c | 59 ++++++++++++++++++++ >> gcc/testsuite/gcc.dg/fold-mul-and-lshift-2.c | 15 +++++ >> 3 files changed, 103 insertions(+) >> create mode 100644 gcc/testsuite/gcc.dg/fold-mul-and-lshift-1.c >> create mode 100644 gcc/testsuite/gcc.dg/fold-mul-and-lshift-2.c >> >> diff --git a/gcc/match.pd b/gcc/match.pd >> index 5c679848bdf..3197d1cac75 100644 >> --- a/gcc/match.pd >> +++ b/gcc/match.pd >> @@ -5231,6 +5231,35 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >> (if (mask) >> (bit_op (shift (convert @0) @1) { mask; }))))))) >> >> +/* Fold ((X >> C1) & C2) * (1 << C1) into X & (C2 << C1), including cases >> where >> + the & happens in a different type. It is the conversion case that isn't >> + a composition of other folds. >> + >> + Let the type of the * and >> be T1 and the type of the & be T2. >> + The fold is valid if the conversion to T2 preserves all information; >> + that is, if T2 is wider than T1 or drops no more than C1 bits from T1. >> + In that case, the & might operate on bits that are dropped by the >> + later conversion to T1 and the multiplication by (1 << C1), but those >> + bits are also dropped by ANDing with C2 << C1 (converted to T1). >> + >> + If the conversion to T2 is not information-preserving, we have to be >> + careful about the later conversion to T1 acting as a sign extension. >> + We need either T2 to be unsigned or the top (sign) bit of C2 to be clear. >> + That is equivalent to testing whether C2 is nonnegative. */ >> +(simplify >> + (mult >> + (convert? (bit_and (convert? (rshift @0 INTEGER_CST@1)) INTEGER_CST@2)) >> + INTEGER_CST@3) >> + (if (tree_nop_conversion_p (type, TREE_TYPE (@0))) >> + (with { auto prec = element_precision (type); } > Since we know this needs to be a scalar, Using TREE_PRECISION here is fine > too.
Yeah, agreed. I'd wondered whether to use TREE_PRECISION instead, but then I'd also wondered about trying to make the fold work for ectors. Guess I ended up between two stools. >> + (if (wi::ltu_p (wi::to_widest (@1), prec)) > > I think using wi::to_wide is better than using wi::to_widest here. What's the reason for preferring wi::to_wide? wi::to_widest should usually be more efficient for this kind of check, since the tree representation allows the underlying HWIs to be used directly. wi::to_wide instead requires masking off bits above the precision. E.g. on an --enable-checking=release compiler: bool foo (tree t, unsigned int n) { return wi::ltu_p (wi::to_widest (t), n); } gives: 188c: 79400c02 ldrh w2, [x0, #6] 1890: 7100045f cmp w2, #0x1 1894: 54000060 b.eq 18a0 <foo(tree_node*, unsigned int)+0x14> // b.none 1898: 52800000 mov w0, #0x0 // #0 189c: d65f03c0 ret 18a0: f9400800 ldr x0, [x0, #16] 18a4: eb21401f cmp x0, w1, uxtw 18a8: 1a9f27e0 cset w0, cc // cc = lo, ul, last 18ac: d65f03c0 ret whereas: bool foo (tree t, unsigned int n) { return wi::ltu_p (wi::to_wide (t), n); } gives: 188c: 79400802 ldrh w2, [x0, #4] 1890: 7100045f cmp w2, #0x1 1894: 54000060 b.eq 18a0 <foo(tree_node*, unsigned int)+0x14> // b.none 1898: 52800000 mov w0, #0x0 // #0 189c: d65f03c0 ret 18a0: a9408800 ldp x0, x2, [x0, #8] 18a4: 79406c03 ldrh w3, [x0, #54] 18a8: 92800000 mov x0, #0xffffffffffffffff // #-1 18ac: 7100fc7f cmp w3, #0x3f 18b0: 9ac32000 lsl x0, x0, x3 18b4: 8a200040 bic x0, x2, x0 18b8: 9a829002 csel x2, x0, x2, ls // ls = plast 18bc: eb21405f cmp x2, w1, uxtw 18c0: 1a9f27e0 cset w0, cc // cc = lo, ul, last 18c4: d65f03c0 ret > The other place in match which checks shift count does: > /* Use a signed compare to leave negative shift counts alone. */ > && wi::ges_p (wi::to_wide (uniform_integer_cst_p (@1)), > element_precision (type))) > > >> + (with { auto shift = tree_to_uhwi (@1); } >> + (if ((prec <= element_precision (TREE_TYPE (@2)) + shift >> + || wi::to_widest (@2) >= 0) > > I think `wi::to_widest (@2) >= 0` can be written as > `!tree_int_cst_sign_bit (@2)`. Ah, yeah, thanks. Richard