Thanks Richard for comments.

Assume we talk about the form 7 as below:

#define DEF_SAT_U_MUL_FMT_7(NT, WT)             \
NT __attribute__((noinline))                    \
sat_u_mul_##NT##_from_##WT##_fmt_7 (NT a, NT b) \
{                                               \
  WT x = (WT)a * (WT)b;                         \
  NT max = -1;                                  \
  bool overflow_p = x > (WT)(max);              \
  return -(NT)(overflow_p) | (NT)x;             \
}

> So why is matching the conversion necessary at all, that is, why is (mult @0 
> @1)
> not sufficient here?  

Because there are many different types combinations, unsigned SAT_MUL need the 
help of a wider type.
For example:

If NT is uint32_t, WT is uint128_t, we need the convert before widen_mul:
  21   │   _12 = (unsigned long) a_6(D);
  22   │   _13 = (unsigned long) b_7(D);
  23   │   x_8 = _12 w* _13;

If NT is uint64_t, WT is uint128_t, we don't have the convert.
  19   │   x_8 = a_6(D) w* b_7(D);

> You are not looking at the types of @0 or @1 at
> all, AFAICS
> one could be 'char' and one could be 'short'?

The type is restricted by (if (types_match (type, @0, @1)). Aka the result must 
have the same type as @0 and @1.

> Captures on optional nodes are "merged" to the operand, so for (convert?@4 @0)
> when there is no conversion @4 == @0.  That is it behaves as if there were two
> captures at the place of @0.

I see, sounds like alias here, thanks for the explanation.

> But isn't _4 | _5 still a sat_u_mul-8-u8 but with a conversion to the
> desired type?
> That is, I wonder if even when the conversion is missing, the
> saturated operation
> can be matched by using (original-type)SAT_UXYZ (...), implying the SAT_UXYZ
> produces a uint8_t (in this case)?  Not sure I phrased this in an
> understandable way ;)

> I'll note that the outer conversion is poorly constrained given the conversion
> around the multiplication/inside the negate is what determines the
> semantics of it
> and that's not constrained either?

For NT is uint8_t and WT is uint128_t, we have the final convert from gimple. 
The bigger types mentioned in above
don't have such convert.  The _11 and _6 should be almost the same, maybe 
nop_convert here is good enough if it failed
to match that.  I will have a try and keep you posted.

  14   │   signed char _6;
  15   │   uint8_t _11;
 ...
  26   │   _3 = (signed char) overflow_p_10;
  27   │   _4 = -_3;
  28   │   _5 = (signed char) x_9;
  29   │   _6 = _4 | _5;
  30   │   _11 = (uint8_t) _6;
  31   │   return _11;

Pan


-----Original Message-----
From: Richard Biener <[email protected]> 
Sent: Thursday, October 23, 2025 7:08 PM
To: Li, Pan2 <[email protected]>
Cc: [email protected]; [email protected]; [email protected]; 
[email protected]; [email protected]; Chen, Ken <[email protected]>; 
Liu, Hongtao <[email protected]>; [email protected]
Subject: Re: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7

On Wed, Oct 22, 2025 at 2:21 PM Li, Pan2 <[email protected]> wrote:
>
> Thanks Richard for comments.
>
> > I'm a bit confused as to this for and the explicit widen_mult case in the 
> > 2nd
> > pattern below.  In fact, the patterns look similar enough to me and I wonder
> > whether the consumer can handle the outer conversion and the conversion
> > of the leafs of the multiplication?  The conversion of the leafs doesn't 
> > have
> > any constraints, nor is it obvious why there's an outer conversion around 
> > the
> > IOR.
>
> The two pattern looks similar up to a point. I separate them into 2 mostly 
> comes from
> the inner convert about mult, aka (mult (convert@4 @0) (convert@5 @1)). Some 
> diff
> type combination need to convert while others are not.
>
> I tried to leverage something like (mult (convert?@4 @0) (convert?@5 @1)) to 
> put
> them together. But it may has additional sematics because add '?' after 
> convert will
> cover (mult (convert@4 @0) @1) which is not expected.

So why is matching the conversion necessary at all, that is, why is (mult @0 @1)
not sufficient here?  You are not looking at the types of @0 or @1 at
all, AFAICS
one could be 'char' and one could be 'short'?

> Another problem about add '?' after convert is capture @4/@5, if there is no 
> need
> to convert, I am not sure how to take care of the @4 from the “with“” scope 
> because it
> is optional. The code acts on capture @4/@5 is not correct when there is no 
> convert.

Captures on optional nodes are "merged" to the operand, so for (convert?@4 @0)
when there is no conversion @4 == @0.  That is it behaves as if there were two
captures at the place of @0.

> I may have another try to merge them into one, but is there any suggestion 
> here?
>
> > outer conversion around the
> > IOR.
>
> Some type(s) may need to do a convert to the final type, like 
> sat_u_mul-8-u8-from-u32,
> The seq may look like blow.
>
>   25   │   _6 = _4 | _5;
>   26   │   _11 = (uint8_t) _6;
>   27   │   return _11;

But isn't _4 | _5 still a sat_u_mul-8-u8 but with a conversion to the
desired type?
That is, I wonder if even when the conversion is missing, the
saturated operation
can be matched by using (original-type)SAT_UXYZ (...), implying the SAT_UXYZ
produces a uint8_t (in this case)?  Not sure I phrased this in an
understandable way ;)

I'll note that the outer conversion is poorly constrained given the conversion
around the multiplication/inside the negate is what determines the
semantics of it
and that's not constrained either?

Richard.

> Pan
>
> -----Original Message-----
> From: Richard Biener <[email protected]>
> Sent: Wednesday, October 22, 2025 5:00 PM
> To: Li, Pan2 <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; 
> [email protected]; [email protected]; Chen, Ken <[email protected]>; 
> Liu, Hongtao <[email protected]>; [email protected]
> Subject: Re: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7
>
> On Mon, Oct 20, 2025 at 3:10 PM <[email protected]> wrote:
> >
> > From: Pan Li <[email protected]>
> >
> > This patch would like to try to match the the unsigned
> > SAT_MUL form 7, aka below:
> >
> >   #define DEF_SAT_U_MUL_FMT_7(NT, WT)             \
> >   NT __attribute__((noinline))                    \
> >   sat_u_mul_##NT##_from_##WT##_fmt_7 (NT a, NT b) \
> >   {                                               \
> >     WT x = (WT)a * (WT)b;                         \
> >     NT max = -1;                                  \
> >     bool overflow_p = x > (WT)(max);              \
> >     return -(NT)(overflow_p) | (NT)x;             \
> >   }
> >
> > while WT is uint128_t, uint64_t, uint32_t and uint16_t, and
> > NT is uint64_t, uint32_t, uint16_t or uint8_t.
> >
> > gcc/ChangeLog:
> >
> >         * match.pd: Add pattern for SAT_MUL form 5 include
> >         mul and widen_mul.
> >
> > Signed-off-by: Pan Li <[email protected]>
> > ---
> >  gcc/match.pd | 35 +++++++++++++++++++++++++++++++++++
> >  1 file changed, 35 insertions(+)
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index bfc51e6579a..0f55a82d989 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3749,6 +3749,41 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >        bool widen_mult_p = prec * 2 == widen_prec;
> >       }
> >       (if (c2_is_type_precision_p && widen_mult_p)))))
> > + (for mult_op (mult widen_mult)
>
> I'm a bit confused as to this for and the explicit widen_mult case in the 2nd
> pattern below.  In fact, the patterns look similar enough to me and I wonder
> whether the consumer can handle the outer conversion and the conversion
> of the leafs of the multiplication?  The conversion of the leafs doesn't have
> any constraints, nor is it obvious why there's an outer conversion around the
> IOR.
>
> > +  (match (unsigned_integer_sat_mul @0 @1)
> > +   (convert?
> > +    (bit_ior (negate (convert (gt @3 INTEGER_CST@2)))
> > +            (convert (mult_op:c@3 (convert@4 @0) (convert@5 @1)))))
> > +   (if (types_match (type, @0, @1))
> > +    (with
> > +     {
> > +      unsigned prec = TYPE_PRECISION (type);
> > +      unsigned widen_prec = TYPE_PRECISION (TREE_TYPE (@3));
> > +      unsigned cvt4_prec = TYPE_PRECISION (TREE_TYPE (@4));
> > +      unsigned cvt5_prec = TYPE_PRECISION (TREE_TYPE (@5));
> > +
> > +      wide_int max = wi::mask (prec, false, widen_prec);
> > +      bool c2_is_max_p = wi::eq_p (wi::to_wide (@2), max);
> > +      bool widen_mult_p = mult_op == WIDEN_MULT_EXPR && cvt4_prec == 
> > cvt5_prec
> > +       && widen_prec == cvt5_prec * 2;
> > +      bool mult_p = mult_op == MULT_EXPR && cvt4_prec == cvt5_prec
> > +       && cvt4_prec == widen_prec && widen_prec > prec;
> > +     }
> > +     (if (c2_is_max_p && (mult_p || widen_mult_p)))))))
> > + (match (unsigned_integer_sat_mul @0 @1)
> > +  (bit_ior (negate (convert (gt @3 INTEGER_CST@2)))
> > +          (convert (widen_mult:c@3 @0 @1)))
> > +  (if (types_match (type, @0, @1))
> > +   (with
> > +    {
> > +     unsigned prec = TYPE_PRECISION (type);
> > +     unsigned widen_prec = TYPE_PRECISION (TREE_TYPE (@3));
> > +
> > +     wide_int max = wi::mask (prec, false, widen_prec);
> > +     bool c2_is_max_p = wi::eq_p (wi::to_wide (@2), max);
> > +     bool widen_mult_p = prec * 2 == widen_prec;
> > +    }
> > +    (if (c2_is_max_p && widen_mult_p)))))
> >  )
> >
> >  /* The boundary condition for case 10: IMM = 1:
> > --
> > 2.43.0
> >

Reply via email to