RE: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7

Li, Pan2 Mon, 24 Nov 2025 06:03:43 -0800

Hi Richard,

The pre-refactor of this series completed, do you prefer to continue the v2 now 
or defer to the next stage 1?
Both works for me, and thanks a lot.


Pan

-----Original Message-----
From: Li, Pan2 <[email protected]> 
Sent: Saturday, November 1, 2025 10:33 AM
To: Richard Biener <[email protected]>
Cc: [email protected]; [email protected]; [email protected]; 
[email protected]; [email protected]; Chen, Ken <[email protected]>; 
Liu, Hongtao <[email protected]>; [email protected]
Subject: RE: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7

Thanks Richard for comments.

> Thanks and sorry for the delay...

Never mind, I see you were busy for middle-end vectorization changes recently.

> OK, so this is basically widen + mult vs. pre-widen + widen_mult vs. 
> widen_mult.
> IMO this would be perfect for a widening_mult helper?

Sure thing, seems also benefits other SAT_MUL pattern(s) introduced in previous 
too. So I may try to refactor that first.

> I see.   That's odd.  In fact I'd say we'd want to transform
> (nop_convert (bit_{ior,and,xor}:c (convert @0) @1))
> to move the conversion inside like

> (for op (bit_ior bit_and bit_xor)
>  (simplify
>   (nop_convert (op:c (convert @0) @1))
>   (op:type (convert @0) (convert @1))))

That make much sense to me, and we have similar scenario for the SAT_MUL 
pattern in previous, will refactor this first.

Pan

-----Original Message-----
From: Richard Biener <[email protected]> 
Sent: Friday, October 31, 2025 9:37 PM
To: Li, Pan2 <[email protected]>
Cc: [email protected]; [email protected]; [email protected]; 
[email protected]; [email protected]; Chen, Ken <[email protected]>; 
Liu, Hongtao <[email protected]>; [email protected]
Subject: Re: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7

On Tue, Oct 28, 2025 at 3:31 AM Li, Pan2 <[email protected]> wrote:
>
> Thanks Richard for comments.
>
> > (match
> > (widening_mult @0 @1)
> > (widen_mult @0 @1))
> > (match
> > (widening_mult @0 @1)
> > (mult (convert @0) (convert @1))
> >  (if (it is widening))
>
> I see, that wrap the convert part into another helper predicate. That make 
> sense to me, and will have a try soon.
> I think the difference comes from the the uint128_t/uint64_t has different 
> gimple compares to the rest.
>
> 1. WT uint128_t, NT uint64_t, we have gimple as below.
>
>   8   │   uint128_t x;
> 19   │   x_8 = a_6(D) w* b_7(D);
>
> The NT (aka uint64_t) doesn't need to convert when widen_mul to uint128_t.
>
> 2. WT uint128_t, NT uint32_t, as well as the rest combination if WT > NT.
>
>     8   │   uint128_t x;
>   21   │   _12 = (unsigned long) a_6(D);
>   22   │   _13 = (unsigned long) b_7(D);
>   23   │   x_8 = _12 w* _13;
>
> The NT (aka uint32_t) need to convert to uint64_t before widen_mul to 
> uint128_t.
>
> In previous, I found it is not easy to merge the 2 case into one expression.
> Thus, add one pattern for case 1 and the other for case 2.

OK, so this is basically widen + mult vs. pre-widen + widen_mult vs. widen_mult.
IMO this would be perfect for a widening_mult helper?

> > Now for the question - part of the pattern is
> >  (convert? (bit_ior (negate (convert from-bool)) (convert (widening_mult 
> > ...)))
> > I wonder whether you've seen the outer conversion being moved inside the 
> > IOR?
> > At least I wonder why there's the (convert ...) around the widening_mult and
> > why that's not conditional?  The textual description of the pattern mentions
> > (NT)x | (NT)overflow_p, so the result should already be in NT, no
> > outer conversion
> > required?  That said, (NT)((WT)x | (WT)overflow_p) would be equally valid,
> > but we fold that inside the IOR?  But not always?
>
> Some of the cases introduces signed type for some intermediate result, like 
> WT is uint16_t, NT is uint8_t.
>
>   11   │   signed char _3;            // we have signed char from _3 to _6.
>   12   │   signed char _4;
>   13   │   signed char _5;
>   14   │   signed char _6;
>   15   │   uint8_t _11;
>   16   │
>   17   │   <bb 2> [local count: 1073741824]:
>   18   │   _1 = (short unsigned int) a_7(D);
>   19   │   _2 = (short unsigned int) b_8(D);
>   20   │   x_9 = _1 * _2;
>   21   │   overflow_p_10 = x_9 > 255;
>   22   │   _3 = (signed char) overflow_p_10;
>   23   │   _4 = -_3;
>   24   │   _5 = (signed char) x_9;
>   25   │   _6 = _4 | _5;
>   26   │   _11 = (uint8_t) _6;

I see.   That's odd.  In fact I'd say we'd want to transform
(nop_convert (bit_{ior,and,xor}:c (convert @0) @1))
to move the conversion inside like

(for op (bit_ior bit_and bit_xor)
 (simplify
  (nop_convert (op:c (convert @0) @1))
  (op:type (convert @0) (convert @1))))

>   27   │   return _11;
>
> When we take a look for WT is uint64_t, NT is uint32_t, we will have unsigned.
>
>   11   │   unsigned int _3;
>   12   │   unsigned int _4;
>   13   │   unsigned int _5;
> ...
>   26   │   _6 = _4 | _5;
>
> The outer convert with mark '?' is introduced to match both of the above 2 
> cases.

Thanks and sorry for the delay...
Richard.

> Pan
>
>
> -----Original Message-----
> From: Richard Biener <[email protected]>
> Sent: Monday, October 27, 2025 9:14 PM
> To: Li, Pan2 <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; 
> [email protected]; [email protected]; Chen, Ken <[email protected]>; 
> Liu, Hongtao <[email protected]>; [email protected]
> Subject: Re: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7
>
> On Sun, Oct 26, 2025 at 11:58 AM Li, Pan2 <[email protected]> wrote:
> >
> > Hi Richard,
> >
> > I had a try for 2 parts those days.
> >
> > 1. Remove outer convert?, it will make the pattern failed to match as the 
> > type is signed
> >      without that convert, and the types_match (type, @0, @1) to be false.
> > 2.  Merge 2 into one pattern with (convert?@4 @0) but it will also result in
> >      pattern match fail(when NT is uint32_t and WT is uint64_t).
> >      Because the types_match (type, capture[0], capture[1]) will be false, 
> > as type is SI and capture is DI.
> >
> > So I think it is possible to separate the common part into a helper pattern 
> > like unsigned_sat_mul_helper.
> > And make the real unsigned_integer_sat_mul(@0, @1) for the first one, and 
> > unsigned_integer_sat_mul((convert @0), (convert @1)
> > for the second one.
> > Then we don't need to duplicate most of the patterns up to a point. How do 
> > you think of it? Thanks a lot.
>
> Maybe we can first get one of the patterns to a form I understand (the
> one with the many conversions)?
> Splitting out parts might only obfuscate things.  It would be maybe
> nice to be able to merge different
> alternatives within the same (match ..) sharing both the name and the
> conditions, like
>
> (match (foo @0 @1)
>  (convert (mult @0 @1))
>  (widen_mult @0 @1)
>  (if (...)))
>
> If you can structure the two patterns in a way that the (if ...) part
> is identical that would help.
>
> Some question:
>
> +   (convert?
> +    (bit_ior (negate (convert (gt @3 INTEGER_CST@2)))
> +            (convert (mult_op:c@3 (convert@4 @0) (convert@5 @1)))))
>
> So I get that we want a widening multiplication but that's either a widen_mult
> already or not.  We might want to have a
>
> (match
>  (widening_mult @0 @1)
>  (widen_mult @0 @1))
> (match
>  (widening_mult @0 @1)
>  (mult (convert @0) (convert @1))
>  (if (it is widening))
>
> to help merging.  Now for the question - part of the pattern is
>
>   (convert? (bit_ior (negate (convert from-bool)) (convert (widening_mult 
> ...)))
>
> I wonder whether you've seen the outer conversion being moved inside the IOR?
> At least I wonder why there's the (convert ...) around the widening_mult and
> why that's not conditional?  The textual description of the pattern mentions
> (NT)x | (NT)overflow_p, so the result should already be in NT, no
> outer conversion
> required?  That said, (NT)((WT)x | (WT)overflow_p) would be equally valid,
> but we fold that inside the IOR?  But not always?
>
> Thanks,
> Richard.
>
> > Pan
> >
> > -----Original Message-----
> > From: Li, Pan2
> > Sent: Friday, October 24, 2025 9:52 AM
> > To: 'Richard Biener' <[email protected]>
> > Cc: [email protected]; [email protected]; [email protected]; 
> > [email protected]; [email protected]; Chen, Ken <[email protected]>; 
> > Liu, Hongtao <[email protected]>; [email protected]
> > Subject: RE: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7
> >
> > Thanks Richard for comments.
> >
> > Assume we talk about the form 7 as below:
> >
> > #define DEF_SAT_U_MUL_FMT_7(NT, WT)             \
> > NT __attribute__((noinline))                    \
> > sat_u_mul_##NT##_from_##WT##_fmt_7 (NT a, NT b) \
> > {                                               \
> >   WT x = (WT)a * (WT)b;                         \
> >   NT max = -1;                                  \
> >   bool overflow_p = x > (WT)(max);              \
> >   return -(NT)(overflow_p) | (NT)x;             \
> > }
> >
> > > So why is matching the conversion necessary at all, that is, why is (mult 
> > > @0 @1)
> > > not sufficient here?
> >
> > Because there are many different types combinations, unsigned SAT_MUL need 
> > the help of a wider type.
> > For example:
> >
> > If NT is uint32_t, WT is uint128_t, we need the convert before widen_mul:
> >   21   │   _12 = (unsigned long) a_6(D);
> >   22   │   _13 = (unsigned long) b_7(D);
> >   23   │   x_8 = _12 w* _13;
> >
> > If NT is uint64_t, WT is uint128_t, we don't have the convert.
> >   19   │   x_8 = a_6(D) w* b_7(D);
> >
> > > You are not looking at the types of @0 or @1 at
> > > all, AFAICS
> > > one could be 'char' and one could be 'short'?
> >
> > The type is restricted by (if (types_match (type, @0, @1)). Aka the result 
> > must have the same type as @0 and @1.
> >
> > > Captures on optional nodes are "merged" to the operand, so for 
> > > (convert?@4 @0)
> > > when there is no conversion @4 == @0.  That is it behaves as if there 
> > > were two
> > > captures at the place of @0.
> >
> > I see, sounds like alias here, thanks for the explanation.
> >
> > > But isn't _4 | _5 still a sat_u_mul-8-u8 but with a conversion to the
> > > desired type?
> > > That is, I wonder if even when the conversion is missing, the
> > > saturated operation
> > > can be matched by using (original-type)SAT_UXYZ (...), implying the 
> > > SAT_UXYZ
> > > produces a uint8_t (in this case)?  Not sure I phrased this in an
> > > understandable way ;)
> >
> > > I'll note that the outer conversion is poorly constrained given the 
> > > conversion
> > > around the multiplication/inside the negate is what determines the
> > > semantics of it
> > > and that's not constrained either?
> >
> > For NT is uint8_t and WT is uint128_t, we have the final convert from 
> > gimple. The bigger types mentioned in above
> > don't have such convert.  The _11 and _6 should be almost the same, maybe 
> > nop_convert here is good enough if it failed
> > to match that.  I will have a try and keep you posted.
> >
> >   14   │   signed char _6;
> >   15   │   uint8_t _11;
> >  ...
> >   26   │   _3 = (signed char) overflow_p_10;
> >   27   │   _4 = -_3;
> >   28   │   _5 = (signed char) x_9;
> >   29   │   _6 = _4 | _5;
> >   30   │   _11 = (uint8_t) _6;
> >   31   │   return _11;
> >
> > Pan
> >
> >
> > -----Original Message-----
> > From: Richard Biener <[email protected]>
> > Sent: Thursday, October 23, 2025 7:08 PM
> > To: Li, Pan2 <[email protected]>
> > Cc: [email protected]; [email protected]; [email protected]; 
> > [email protected]; [email protected]; Chen, Ken <[email protected]>; 
> > Liu, Hongtao <[email protected]>; [email protected]
> > Subject: Re: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7
> >
> > On Wed, Oct 22, 2025 at 2:21 PM Li, Pan2 <[email protected]> wrote:
> > >
> > > Thanks Richard for comments.
> > >
> > > > I'm a bit confused as to this for and the explicit widen_mult case in 
> > > > the 2nd
> > > > pattern below.  In fact, the patterns look similar enough to me and I 
> > > > wonder
> > > > whether the consumer can handle the outer conversion and the conversion
> > > > of the leafs of the multiplication?  The conversion of the leafs 
> > > > doesn't have
> > > > any constraints, nor is it obvious why there's an outer conversion 
> > > > around the
> > > > IOR.
> > >
> > > The two pattern looks similar up to a point. I separate them into 2 
> > > mostly comes from
> > > the inner convert about mult, aka (mult (convert@4 @0) (convert@5 @1)). 
> > > Some diff
> > > type combination need to convert while others are not.
> > >
> > > I tried to leverage something like (mult (convert?@4 @0) (convert?@5 @1)) 
> > > to put
> > > them together. But it may has additional sematics because add '?' after 
> > > convert will
> > > cover (mult (convert@4 @0) @1) which is not expected.
> >
> > So why is matching the conversion necessary at all, that is, why is (mult 
> > @0 @1)
> > not sufficient here?  You are not looking at the types of @0 or @1 at
> > all, AFAICS
> > one could be 'char' and one could be 'short'?
> >
> > > Another problem about add '?' after convert is capture @4/@5, if there is 
> > > no need
> > > to convert, I am not sure how to take care of the @4 from the “with“” 
> > > scope because it
> > > is optional. The code acts on capture @4/@5 is not correct when there is 
> > > no convert.
> >
> > Captures on optional nodes are "merged" to the operand, so for (convert?@4 
> > @0)
> > when there is no conversion @4 == @0.  That is it behaves as if there were 
> > two
> > captures at the place of @0.
> >
> > > I may have another try to merge them into one, but is there any 
> > > suggestion here?
> > >
> > > > outer conversion around the
> > > > IOR.
> > >
> > > Some type(s) may need to do a convert to the final type, like 
> > > sat_u_mul-8-u8-from-u32,
> > > The seq may look like blow.
> > >
> > >   25   │   _6 = _4 | _5;
> > >   26   │   _11 = (uint8_t) _6;
> > >   27   │   return _11;
> >
> > But isn't _4 | _5 still a sat_u_mul-8-u8 but with a conversion to the
> > desired type?
> > That is, I wonder if even when the conversion is missing, the
> > saturated operation
> > can be matched by using (original-type)SAT_UXYZ (...), implying the SAT_UXYZ
> > produces a uint8_t (in this case)?  Not sure I phrased this in an
> > understandable way ;)
> >
> > I'll note that the outer conversion is poorly constrained given the 
> > conversion
> > around the multiplication/inside the negate is what determines the
> > semantics of it
> > and that's not constrained either?
> >
> > Richard.
> >
> > > Pan
> > >
> > > -----Original Message-----
> > > From: Richard Biener <[email protected]>
> > > Sent: Wednesday, October 22, 2025 5:00 PM
> > > To: Li, Pan2 <[email protected]>
> > > Cc: [email protected]; [email protected]; [email protected]; 
> > > [email protected]; [email protected]; Chen, Ken 
> > > <[email protected]>; Liu, Hongtao <[email protected]>; 
> > > [email protected]
> > > Subject: Re: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7
> > >
> > > On Mon, Oct 20, 2025 at 3:10 PM <[email protected]> wrote:
> > > >
> > > > From: Pan Li <[email protected]>
> > > >
> > > > This patch would like to try to match the the unsigned
> > > > SAT_MUL form 7, aka below:
> > > >
> > > >   #define DEF_SAT_U_MUL_FMT_7(NT, WT)             \
> > > >   NT __attribute__((noinline))                    \
> > > >   sat_u_mul_##NT##_from_##WT##_fmt_7 (NT a, NT b) \
> > > >   {                                               \
> > > >     WT x = (WT)a * (WT)b;                         \
> > > >     NT max = -1;                                  \
> > > >     bool overflow_p = x > (WT)(max);              \
> > > >     return -(NT)(overflow_p) | (NT)x;             \
> > > >   }
> > > >
> > > > while WT is uint128_t, uint64_t, uint32_t and uint16_t, and
> > > > NT is uint64_t, uint32_t, uint16_t or uint8_t.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         * match.pd: Add pattern for SAT_MUL form 5 include
> > > >         mul and widen_mul.
> > > >
> > > > Signed-off-by: Pan Li <[email protected]>
> > > > ---
> > > >  gcc/match.pd | 35 +++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 35 insertions(+)
> > > >
> > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > index bfc51e6579a..0f55a82d989 100644
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -3749,6 +3749,41 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > >        bool widen_mult_p = prec * 2 == widen_prec;
> > > >       }
> > > >       (if (c2_is_type_precision_p && widen_mult_p)))))
> > > > + (for mult_op (mult widen_mult)
> > >
> > > I'm a bit confused as to this for and the explicit widen_mult case in the 
> > > 2nd
> > > pattern below.  In fact, the patterns look similar enough to me and I 
> > > wonder
> > > whether the consumer can handle the outer conversion and the conversion
> > > of the leafs of the multiplication?  The conversion of the leafs doesn't 
> > > have
> > > any constraints, nor is it obvious why there's an outer conversion around 
> > > the
> > > IOR.
> > >
> > > > +  (match (unsigned_integer_sat_mul @0 @1)
> > > > +   (convert?
> > > > +    (bit_ior (negate (convert (gt @3 INTEGER_CST@2)))
> > > > +            (convert (mult_op:c@3 (convert@4 @0) (convert@5 @1)))))
> > > > +   (if (types_match (type, @0, @1))
> > > > +    (with
> > > > +     {
> > > > +      unsigned prec = TYPE_PRECISION (type);
> > > > +      unsigned widen_prec = TYPE_PRECISION (TREE_TYPE (@3));
> > > > +      unsigned cvt4_prec = TYPE_PRECISION (TREE_TYPE (@4));
> > > > +      unsigned cvt5_prec = TYPE_PRECISION (TREE_TYPE (@5));
> > > > +
> > > > +      wide_int max = wi::mask (prec, false, widen_prec);
> > > > +      bool c2_is_max_p = wi::eq_p (wi::to_wide (@2), max);
> > > > +      bool widen_mult_p = mult_op == WIDEN_MULT_EXPR && cvt4_prec == 
> > > > cvt5_prec
> > > > +       && widen_prec == cvt5_prec * 2;
> > > > +      bool mult_p = mult_op == MULT_EXPR && cvt4_prec == cvt5_prec
> > > > +       && cvt4_prec == widen_prec && widen_prec > prec;
> > > > +     }
> > > > +     (if (c2_is_max_p && (mult_p || widen_mult_p)))))))
> > > > + (match (unsigned_integer_sat_mul @0 @1)
> > > > +  (bit_ior (negate (convert (gt @3 INTEGER_CST@2)))
> > > > +          (convert (widen_mult:c@3 @0 @1)))
> > > > +  (if (types_match (type, @0, @1))
> > > > +   (with
> > > > +    {
> > > > +     unsigned prec = TYPE_PRECISION (type);
> > > > +     unsigned widen_prec = TYPE_PRECISION (TREE_TYPE (@3));
> > > > +
> > > > +     wide_int max = wi::mask (prec, false, widen_prec);
> > > > +     bool c2_is_max_p = wi::eq_p (wi::to_wide (@2), max);
> > > > +     bool widen_mult_p = prec * 2 == widen_prec;
> > > > +    }
> > > > +    (if (c2_is_max_p && widen_mult_p)))))
> > > >  )
> > > >
> > > >  /* The boundary condition for case 10: IMM = 1:
> > > > --
> > > > 2.43.0
> > > >

RE: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7

Reply via email to