Hi Richard, The pre-refactor of this series completed, do you prefer to continue the v2 now or defer to the next stage 1? Both works for me, and thanks a lot.
Pan -----Original Message----- From: Li, Pan2 <[email protected]> Sent: Saturday, November 1, 2025 10:33 AM To: Richard Biener <[email protected]> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Chen, Ken <[email protected]>; Liu, Hongtao <[email protected]>; [email protected] Subject: RE: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7 Thanks Richard for comments. > Thanks and sorry for the delay... Never mind, I see you were busy for middle-end vectorization changes recently. > OK, so this is basically widen + mult vs. pre-widen + widen_mult vs. > widen_mult. > IMO this would be perfect for a widening_mult helper? Sure thing, seems also benefits other SAT_MUL pattern(s) introduced in previous too. So I may try to refactor that first. > I see. That's odd. In fact I'd say we'd want to transform > (nop_convert (bit_{ior,and,xor}:c (convert @0) @1)) > to move the conversion inside like > (for op (bit_ior bit_and bit_xor) > (simplify > (nop_convert (op:c (convert @0) @1)) > (op:type (convert @0) (convert @1)))) That make much sense to me, and we have similar scenario for the SAT_MUL pattern in previous, will refactor this first. Pan -----Original Message----- From: Richard Biener <[email protected]> Sent: Friday, October 31, 2025 9:37 PM To: Li, Pan2 <[email protected]> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Chen, Ken <[email protected]>; Liu, Hongtao <[email protected]>; [email protected] Subject: Re: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7 On Tue, Oct 28, 2025 at 3:31 AM Li, Pan2 <[email protected]> wrote: > > Thanks Richard for comments. > > > (match > > (widening_mult @0 @1) > > (widen_mult @0 @1)) > > (match > > (widening_mult @0 @1) > > (mult (convert @0) (convert @1)) > > (if (it is widening)) > > I see, that wrap the convert part into another helper predicate. That make > sense to me, and will have a try soon. > I think the difference comes from the the uint128_t/uint64_t has different > gimple compares to the rest. > > 1. WT uint128_t, NT uint64_t, we have gimple as below. > > 8 │ uint128_t x; > 19 │ x_8 = a_6(D) w* b_7(D); > > The NT (aka uint64_t) doesn't need to convert when widen_mul to uint128_t. > > 2. WT uint128_t, NT uint32_t, as well as the rest combination if WT > NT. > > 8 │ uint128_t x; > 21 │ _12 = (unsigned long) a_6(D); > 22 │ _13 = (unsigned long) b_7(D); > 23 │ x_8 = _12 w* _13; > > The NT (aka uint32_t) need to convert to uint64_t before widen_mul to > uint128_t. > > In previous, I found it is not easy to merge the 2 case into one expression. > Thus, add one pattern for case 1 and the other for case 2. OK, so this is basically widen + mult vs. pre-widen + widen_mult vs. widen_mult. IMO this would be perfect for a widening_mult helper? > > Now for the question - part of the pattern is > > (convert? (bit_ior (negate (convert from-bool)) (convert (widening_mult > > ...))) > > I wonder whether you've seen the outer conversion being moved inside the > > IOR? > > At least I wonder why there's the (convert ...) around the widening_mult and > > why that's not conditional? The textual description of the pattern mentions > > (NT)x | (NT)overflow_p, so the result should already be in NT, no > > outer conversion > > required? That said, (NT)((WT)x | (WT)overflow_p) would be equally valid, > > but we fold that inside the IOR? But not always? > > Some of the cases introduces signed type for some intermediate result, like > WT is uint16_t, NT is uint8_t. > > 11 │ signed char _3; // we have signed char from _3 to _6. > 12 │ signed char _4; > 13 │ signed char _5; > 14 │ signed char _6; > 15 │ uint8_t _11; > 16 │ > 17 │ <bb 2> [local count: 1073741824]: > 18 │ _1 = (short unsigned int) a_7(D); > 19 │ _2 = (short unsigned int) b_8(D); > 20 │ x_9 = _1 * _2; > 21 │ overflow_p_10 = x_9 > 255; > 22 │ _3 = (signed char) overflow_p_10; > 23 │ _4 = -_3; > 24 │ _5 = (signed char) x_9; > 25 │ _6 = _4 | _5; > 26 │ _11 = (uint8_t) _6; I see. That's odd. In fact I'd say we'd want to transform (nop_convert (bit_{ior,and,xor}:c (convert @0) @1)) to move the conversion inside like (for op (bit_ior bit_and bit_xor) (simplify (nop_convert (op:c (convert @0) @1)) (op:type (convert @0) (convert @1)))) > 27 │ return _11; > > When we take a look for WT is uint64_t, NT is uint32_t, we will have unsigned. > > 11 │ unsigned int _3; > 12 │ unsigned int _4; > 13 │ unsigned int _5; > ... > 26 │ _6 = _4 | _5; > > The outer convert with mark '?' is introduced to match both of the above 2 > cases. Thanks and sorry for the delay... Richard. > Pan > > > -----Original Message----- > From: Richard Biener <[email protected]> > Sent: Monday, October 27, 2025 9:14 PM > To: Li, Pan2 <[email protected]> > Cc: [email protected]; [email protected]; [email protected]; > [email protected]; [email protected]; Chen, Ken <[email protected]>; > Liu, Hongtao <[email protected]>; [email protected] > Subject: Re: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7 > > On Sun, Oct 26, 2025 at 11:58 AM Li, Pan2 <[email protected]> wrote: > > > > Hi Richard, > > > > I had a try for 2 parts those days. > > > > 1. Remove outer convert?, it will make the pattern failed to match as the > > type is signed > > without that convert, and the types_match (type, @0, @1) to be false. > > 2. Merge 2 into one pattern with (convert?@4 @0) but it will also result in > > pattern match fail(when NT is uint32_t and WT is uint64_t). > > Because the types_match (type, capture[0], capture[1]) will be false, > > as type is SI and capture is DI. > > > > So I think it is possible to separate the common part into a helper pattern > > like unsigned_sat_mul_helper. > > And make the real unsigned_integer_sat_mul(@0, @1) for the first one, and > > unsigned_integer_sat_mul((convert @0), (convert @1) > > for the second one. > > Then we don't need to duplicate most of the patterns up to a point. How do > > you think of it? Thanks a lot. > > Maybe we can first get one of the patterns to a form I understand (the > one with the many conversions)? > Splitting out parts might only obfuscate things. It would be maybe > nice to be able to merge different > alternatives within the same (match ..) sharing both the name and the > conditions, like > > (match (foo @0 @1) > (convert (mult @0 @1)) > (widen_mult @0 @1) > (if (...))) > > If you can structure the two patterns in a way that the (if ...) part > is identical that would help. > > Some question: > > + (convert? > + (bit_ior (negate (convert (gt @3 INTEGER_CST@2))) > + (convert (mult_op:c@3 (convert@4 @0) (convert@5 @1))))) > > So I get that we want a widening multiplication but that's either a widen_mult > already or not. We might want to have a > > (match > (widening_mult @0 @1) > (widen_mult @0 @1)) > (match > (widening_mult @0 @1) > (mult (convert @0) (convert @1)) > (if (it is widening)) > > to help merging. Now for the question - part of the pattern is > > (convert? (bit_ior (negate (convert from-bool)) (convert (widening_mult > ...))) > > I wonder whether you've seen the outer conversion being moved inside the IOR? > At least I wonder why there's the (convert ...) around the widening_mult and > why that's not conditional? The textual description of the pattern mentions > (NT)x | (NT)overflow_p, so the result should already be in NT, no > outer conversion > required? That said, (NT)((WT)x | (WT)overflow_p) would be equally valid, > but we fold that inside the IOR? But not always? > > Thanks, > Richard. > > > Pan > > > > -----Original Message----- > > From: Li, Pan2 > > Sent: Friday, October 24, 2025 9:52 AM > > To: 'Richard Biener' <[email protected]> > > Cc: [email protected]; [email protected]; [email protected]; > > [email protected]; [email protected]; Chen, Ken <[email protected]>; > > Liu, Hongtao <[email protected]>; [email protected] > > Subject: RE: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7 > > > > Thanks Richard for comments. > > > > Assume we talk about the form 7 as below: > > > > #define DEF_SAT_U_MUL_FMT_7(NT, WT) \ > > NT __attribute__((noinline)) \ > > sat_u_mul_##NT##_from_##WT##_fmt_7 (NT a, NT b) \ > > { \ > > WT x = (WT)a * (WT)b; \ > > NT max = -1; \ > > bool overflow_p = x > (WT)(max); \ > > return -(NT)(overflow_p) | (NT)x; \ > > } > > > > > So why is matching the conversion necessary at all, that is, why is (mult > > > @0 @1) > > > not sufficient here? > > > > Because there are many different types combinations, unsigned SAT_MUL need > > the help of a wider type. > > For example: > > > > If NT is uint32_t, WT is uint128_t, we need the convert before widen_mul: > > 21 │ _12 = (unsigned long) a_6(D); > > 22 │ _13 = (unsigned long) b_7(D); > > 23 │ x_8 = _12 w* _13; > > > > If NT is uint64_t, WT is uint128_t, we don't have the convert. > > 19 │ x_8 = a_6(D) w* b_7(D); > > > > > You are not looking at the types of @0 or @1 at > > > all, AFAICS > > > one could be 'char' and one could be 'short'? > > > > The type is restricted by (if (types_match (type, @0, @1)). Aka the result > > must have the same type as @0 and @1. > > > > > Captures on optional nodes are "merged" to the operand, so for > > > (convert?@4 @0) > > > when there is no conversion @4 == @0. That is it behaves as if there > > > were two > > > captures at the place of @0. > > > > I see, sounds like alias here, thanks for the explanation. > > > > > But isn't _4 | _5 still a sat_u_mul-8-u8 but with a conversion to the > > > desired type? > > > That is, I wonder if even when the conversion is missing, the > > > saturated operation > > > can be matched by using (original-type)SAT_UXYZ (...), implying the > > > SAT_UXYZ > > > produces a uint8_t (in this case)? Not sure I phrased this in an > > > understandable way ;) > > > > > I'll note that the outer conversion is poorly constrained given the > > > conversion > > > around the multiplication/inside the negate is what determines the > > > semantics of it > > > and that's not constrained either? > > > > For NT is uint8_t and WT is uint128_t, we have the final convert from > > gimple. The bigger types mentioned in above > > don't have such convert. The _11 and _6 should be almost the same, maybe > > nop_convert here is good enough if it failed > > to match that. I will have a try and keep you posted. > > > > 14 │ signed char _6; > > 15 │ uint8_t _11; > > ... > > 26 │ _3 = (signed char) overflow_p_10; > > 27 │ _4 = -_3; > > 28 │ _5 = (signed char) x_9; > > 29 │ _6 = _4 | _5; > > 30 │ _11 = (uint8_t) _6; > > 31 │ return _11; > > > > Pan > > > > > > -----Original Message----- > > From: Richard Biener <[email protected]> > > Sent: Thursday, October 23, 2025 7:08 PM > > To: Li, Pan2 <[email protected]> > > Cc: [email protected]; [email protected]; [email protected]; > > [email protected]; [email protected]; Chen, Ken <[email protected]>; > > Liu, Hongtao <[email protected]>; [email protected] > > Subject: Re: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7 > > > > On Wed, Oct 22, 2025 at 2:21 PM Li, Pan2 <[email protected]> wrote: > > > > > > Thanks Richard for comments. > > > > > > > I'm a bit confused as to this for and the explicit widen_mult case in > > > > the 2nd > > > > pattern below. In fact, the patterns look similar enough to me and I > > > > wonder > > > > whether the consumer can handle the outer conversion and the conversion > > > > of the leafs of the multiplication? The conversion of the leafs > > > > doesn't have > > > > any constraints, nor is it obvious why there's an outer conversion > > > > around the > > > > IOR. > > > > > > The two pattern looks similar up to a point. I separate them into 2 > > > mostly comes from > > > the inner convert about mult, aka (mult (convert@4 @0) (convert@5 @1)). > > > Some diff > > > type combination need to convert while others are not. > > > > > > I tried to leverage something like (mult (convert?@4 @0) (convert?@5 @1)) > > > to put > > > them together. But it may has additional sematics because add '?' after > > > convert will > > > cover (mult (convert@4 @0) @1) which is not expected. > > > > So why is matching the conversion necessary at all, that is, why is (mult > > @0 @1) > > not sufficient here? You are not looking at the types of @0 or @1 at > > all, AFAICS > > one could be 'char' and one could be 'short'? > > > > > Another problem about add '?' after convert is capture @4/@5, if there is > > > no need > > > to convert, I am not sure how to take care of the @4 from the “with“” > > > scope because it > > > is optional. The code acts on capture @4/@5 is not correct when there is > > > no convert. > > > > Captures on optional nodes are "merged" to the operand, so for (convert?@4 > > @0) > > when there is no conversion @4 == @0. That is it behaves as if there were > > two > > captures at the place of @0. > > > > > I may have another try to merge them into one, but is there any > > > suggestion here? > > > > > > > outer conversion around the > > > > IOR. > > > > > > Some type(s) may need to do a convert to the final type, like > > > sat_u_mul-8-u8-from-u32, > > > The seq may look like blow. > > > > > > 25 │ _6 = _4 | _5; > > > 26 │ _11 = (uint8_t) _6; > > > 27 │ return _11; > > > > But isn't _4 | _5 still a sat_u_mul-8-u8 but with a conversion to the > > desired type? > > That is, I wonder if even when the conversion is missing, the > > saturated operation > > can be matched by using (original-type)SAT_UXYZ (...), implying the SAT_UXYZ > > produces a uint8_t (in this case)? Not sure I phrased this in an > > understandable way ;) > > > > I'll note that the outer conversion is poorly constrained given the > > conversion > > around the multiplication/inside the negate is what determines the > > semantics of it > > and that's not constrained either? > > > > Richard. > > > > > Pan > > > > > > -----Original Message----- > > > From: Richard Biener <[email protected]> > > > Sent: Wednesday, October 22, 2025 5:00 PM > > > To: Li, Pan2 <[email protected]> > > > Cc: [email protected]; [email protected]; [email protected]; > > > [email protected]; [email protected]; Chen, Ken > > > <[email protected]>; Liu, Hongtao <[email protected]>; > > > [email protected] > > > Subject: Re: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7 > > > > > > On Mon, Oct 20, 2025 at 3:10 PM <[email protected]> wrote: > > > > > > > > From: Pan Li <[email protected]> > > > > > > > > This patch would like to try to match the the unsigned > > > > SAT_MUL form 7, aka below: > > > > > > > > #define DEF_SAT_U_MUL_FMT_7(NT, WT) \ > > > > NT __attribute__((noinline)) \ > > > > sat_u_mul_##NT##_from_##WT##_fmt_7 (NT a, NT b) \ > > > > { \ > > > > WT x = (WT)a * (WT)b; \ > > > > NT max = -1; \ > > > > bool overflow_p = x > (WT)(max); \ > > > > return -(NT)(overflow_p) | (NT)x; \ > > > > } > > > > > > > > while WT is uint128_t, uint64_t, uint32_t and uint16_t, and > > > > NT is uint64_t, uint32_t, uint16_t or uint8_t. > > > > > > > > gcc/ChangeLog: > > > > > > > > * match.pd: Add pattern for SAT_MUL form 5 include > > > > mul and widen_mul. > > > > > > > > Signed-off-by: Pan Li <[email protected]> > > > > --- > > > > gcc/match.pd | 35 +++++++++++++++++++++++++++++++++++ > > > > 1 file changed, 35 insertions(+) > > > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > > index bfc51e6579a..0f55a82d989 100644 > > > > --- a/gcc/match.pd > > > > +++ b/gcc/match.pd > > > > @@ -3749,6 +3749,41 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > > > bool widen_mult_p = prec * 2 == widen_prec; > > > > } > > > > (if (c2_is_type_precision_p && widen_mult_p))))) > > > > + (for mult_op (mult widen_mult) > > > > > > I'm a bit confused as to this for and the explicit widen_mult case in the > > > 2nd > > > pattern below. In fact, the patterns look similar enough to me and I > > > wonder > > > whether the consumer can handle the outer conversion and the conversion > > > of the leafs of the multiplication? The conversion of the leafs doesn't > > > have > > > any constraints, nor is it obvious why there's an outer conversion around > > > the > > > IOR. > > > > > > > + (match (unsigned_integer_sat_mul @0 @1) > > > > + (convert? > > > > + (bit_ior (negate (convert (gt @3 INTEGER_CST@2))) > > > > + (convert (mult_op:c@3 (convert@4 @0) (convert@5 @1))))) > > > > + (if (types_match (type, @0, @1)) > > > > + (with > > > > + { > > > > + unsigned prec = TYPE_PRECISION (type); > > > > + unsigned widen_prec = TYPE_PRECISION (TREE_TYPE (@3)); > > > > + unsigned cvt4_prec = TYPE_PRECISION (TREE_TYPE (@4)); > > > > + unsigned cvt5_prec = TYPE_PRECISION (TREE_TYPE (@5)); > > > > + > > > > + wide_int max = wi::mask (prec, false, widen_prec); > > > > + bool c2_is_max_p = wi::eq_p (wi::to_wide (@2), max); > > > > + bool widen_mult_p = mult_op == WIDEN_MULT_EXPR && cvt4_prec == > > > > cvt5_prec > > > > + && widen_prec == cvt5_prec * 2; > > > > + bool mult_p = mult_op == MULT_EXPR && cvt4_prec == cvt5_prec > > > > + && cvt4_prec == widen_prec && widen_prec > prec; > > > > + } > > > > + (if (c2_is_max_p && (mult_p || widen_mult_p))))))) > > > > + (match (unsigned_integer_sat_mul @0 @1) > > > > + (bit_ior (negate (convert (gt @3 INTEGER_CST@2))) > > > > + (convert (widen_mult:c@3 @0 @1))) > > > > + (if (types_match (type, @0, @1)) > > > > + (with > > > > + { > > > > + unsigned prec = TYPE_PRECISION (type); > > > > + unsigned widen_prec = TYPE_PRECISION (TREE_TYPE (@3)); > > > > + > > > > + wide_int max = wi::mask (prec, false, widen_prec); > > > > + bool c2_is_max_p = wi::eq_p (wi::to_wide (@2), max); > > > > + bool widen_mult_p = prec * 2 == widen_prec; > > > > + } > > > > + (if (c2_is_max_p && widen_mult_p))))) > > > > ) > > > > > > > > /* The boundary condition for case 10: IMM = 1: > > > > -- > > > > 2.43.0 > > > >
