> -----Original Message-----
> From: Jan Hubicka <hubi...@ucw.cz>
> Sent: Friday, April 25, 2025 12:27 AM
> To: Liu, Hongtao <hongtao....@intel.com>
> Cc: gcc-patches@gcc.gnu.org; crazy...@gmail.com; hjl.to...@gmail.com
> Subject: Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.
> 
> > Since ix86_expand_sse_movcc will simplify them into a simple vmov,
> > vpand or vpandn.
> > Current register_operand/vector_operand could lose some optimization
> > opportunity.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> >     * config/i386/predicates.md (vector_or_0_or_1s_operand): New
> predicate.
> >     (nonimm_or_0_or_1s_operand): Ditto.
> >     * config/i386/sse.md (vcond_mask_<mode><sseintvecmodelower>):
> >     Extend the predicate of operands1 to accept 0 or allones
> >     operands.
> >     (vcond_mask_<mode><sseintvecmodelower>): Ditto.
> >     (vcond_mask_v1tiv1ti): Ditto.
> >     (vcond_mask_<mode><sseintvecmodelower>): Ditto.
> >     * config/i386/i386.md (mov<mode>cc): Ditto for operands[2] and
> >     operands[3].
> >
> > gcc/testsuite/ChangeLog:
> >
> >     * gcc.target/i386/blendv-to-maxmin.c: New test.
> >     * gcc.target/i386/blendv-to-pand.c: New test.
> 
> > diff --git a/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > new file mode 100644
> > index 00000000000..042eb7d8f24
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=x86-64-v3 -O2 -mfpmath=sse" } */
> > +/* { dg-final { scan-assembler-times "vmaxsd" 1 } } */
> > +
> > +double
> > +foo (double a)
> > +{
> > +  if (a > 0.0)
> > +    return a;
> > +  return 0.0;
> > +}
> 
> With -ffast-math this is matched as MAX_EXPR at gimple level. Without -ffast-
> math we can not do that since MAX_EXPR (and RTL SMAX) are explicitely
> documented as unspecified when one of parameters is nan.
> 
> So without -ffast-math at combine time we see:
> (insn 6 3 7 2 (set (reg:DF 103)
>         (const_double:DF 0.0 [0x0.0p+0])) "e.c":7:1 169 {*movdf_internal}
>      (nil))
> (insn 7 6 12 2 (set (reg:DF 102 [ _2 ])
>         (unspec:DF [
>                 (reg:DF 104 [ a ])
>                 (reg:DF 103)
>             ] UNSPEC_IEEE_MAX)) "e.c":7:1 1825 {*ieee_smaxdf3}
>      (expr_list:REG_DEAD (reg:DF 104 [ a ])
>         (expr_list:REG_DEAD (reg:DF 103)
>             (nil))))
> 
> maxss is defined as:
> 
> MAX(SRC1, SRC2)
> {
>     IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST := SRC2;
>         ELSE IF (SRC1 = NaN) THEN DEST := SRC2; FI;
>         ELSE IF (SRC2 = NaN) THEN DEST := SRC2; FI;
>         ELSE IF (SRC1 > SRC2) THEN DEST := SRC1;
>         ELSE DEST := SRC2;
>     FI;
> }
> 
> which I think translates to
>   SRC1 > SRC1 : SRC1 : SRC2
Yes, for minss/maxss
> 
> If SRC1 and SRC2 are both 0, this should evaulate to false and return RC2 if
> one of them is NaN this should evaulate to false and return SRC2
> 
> so it seems to do right side cases and has direct RTL equivalent.  So why we
> need UNSPEC_IEEE_MAX at all? Expressing this in RTL directly would enable
> RTL passes to do better job.
> Similarly for BLENDV...
Note for blendv, it checks the significant bit of the mask, not simple
 if_then_else
  mask
  if_true 
  if_false

It should be 
if_then_else
   ashiftrt mask 31
   if_true
   if_false

Maybe not very useful in practice, just like why there's UNSPEC_FMADDSUB

6334
 6335;; It would be possible to represent these without the UNSPEC as
 6336;;
 6337;; (vec_merge
 6338;;   (fma op1 op2 op3)
 6339;;   (fma op1 op2 (neg op3))
 6340;;   (merge-const))
 6341;;
 6342;; But this doesn't seem useful in practice.
 6343
 6344(define_expand "vec_fmaddsub<mode>4"
 6345  [(set (match_operand:VFH 0 "register_operand")
 6346        (unspec:VFH
 6347          [(match_operand:VFH 1 "nonimmediate_operand")
 6348           (match_operand:VFH 2 "nonimmediate_operand")
 6349           (match_operand:VFH 3 "nonimmediate_operand")]
 6350          UNSPEC_FMADDSUB))]
 6351  "TARGET_FMA || TARGET_FMA4 || (<MODE_SIZE> == 64 || TARGET_AVX512VL)")
 6352

> 
> Honza

Reply via email to