> -----Original Message-----
> From: Jan Hubicka <hubi...@ucw.cz>
> Sent: Friday, April 25, 2025 12:27 AM
> To: Liu, Hongtao <hongtao....@intel.com>
> Cc: gcc-patches@gcc.gnu.org; crazy...@gmail.com; hjl.to...@gmail.com
> Subject: Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.
>
> > Since ix86_expand_sse_movcc will simplify them into a simple vmov,
> > vpand or vpandn.
> > Current register_operand/vector_operand could lose some optimization
> > opportunity.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > * config/i386/predicates.md (vector_or_0_or_1s_operand): New
> predicate.
> > (nonimm_or_0_or_1s_operand): Ditto.
> > * config/i386/sse.md (vcond_mask_<mode><sseintvecmodelower>):
> > Extend the predicate of operands1 to accept 0 or allones
> > operands.
> > (vcond_mask_<mode><sseintvecmodelower>): Ditto.
> > (vcond_mask_v1tiv1ti): Ditto.
> > (vcond_mask_<mode><sseintvecmodelower>): Ditto.
> > * config/i386/i386.md (mov<mode>cc): Ditto for operands[2] and
> > operands[3].
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/blendv-to-maxmin.c: New test.
> > * gcc.target/i386/blendv-to-pand.c: New test.
>
> > diff --git a/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > new file mode 100644
> > index 00000000000..042eb7d8f24
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=x86-64-v3 -O2 -mfpmath=sse" } */
> > +/* { dg-final { scan-assembler-times "vmaxsd" 1 } } */
> > +
> > +double
> > +foo (double a)
> > +{
> > + if (a > 0.0)
> > + return a;
> > + return 0.0;
> > +}
>
> With -ffast-math this is matched as MAX_EXPR at gimple level. Without -ffast-
> math we can not do that since MAX_EXPR (and RTL SMAX) are explicitely
> documented as unspecified when one of parameters is nan.
>
> So without -ffast-math at combine time we see:
> (insn 6 3 7 2 (set (reg:DF 103)
> (const_double:DF 0.0 [0x0.0p+0])) "e.c":7:1 169 {*movdf_internal}
> (nil))
> (insn 7 6 12 2 (set (reg:DF 102 [ _2 ])
> (unspec:DF [
> (reg:DF 104 [ a ])
> (reg:DF 103)
> ] UNSPEC_IEEE_MAX)) "e.c":7:1 1825 {*ieee_smaxdf3}
> (expr_list:REG_DEAD (reg:DF 104 [ a ])
> (expr_list:REG_DEAD (reg:DF 103)
> (nil))))
>
> maxss is defined as:
>
> MAX(SRC1, SRC2)
> {
> IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST := SRC2;
> ELSE IF (SRC1 = NaN) THEN DEST := SRC2; FI;
> ELSE IF (SRC2 = NaN) THEN DEST := SRC2; FI;
> ELSE IF (SRC1 > SRC2) THEN DEST := SRC1;
> ELSE DEST := SRC2;
> FI;
> }
>
> which I think translates to
> SRC1 > SRC1 : SRC1 : SRC2
Yes, for minss/maxss
>
> If SRC1 and SRC2 are both 0, this should evaulate to false and return RC2 if
> one of them is NaN this should evaulate to false and return SRC2
>
> so it seems to do right side cases and has direct RTL equivalent. So why we
> need UNSPEC_IEEE_MAX at all? Expressing this in RTL directly would enable
> RTL passes to do better job.
> Similarly for BLENDV...
Note for blendv, it checks the significant bit of the mask, not simple
if_then_else
mask
if_true
if_false
It should be
if_then_else
ashiftrt mask 31
if_true
if_false
Maybe not very useful in practice, just like why there's UNSPEC_FMADDSUB
6334
6335;; It would be possible to represent these without the UNSPEC as
6336;;
6337;; (vec_merge
6338;; (fma op1 op2 op3)
6339;; (fma op1 op2 (neg op3))
6340;; (merge-const))
6341;;
6342;; But this doesn't seem useful in practice.
6343
6344(define_expand "vec_fmaddsub<mode>4"
6345 [(set (match_operand:VFH 0 "register_operand")
6346 (unspec:VFH
6347 [(match_operand:VFH 1 "nonimmediate_operand")
6348 (match_operand:VFH 2 "nonimmediate_operand")
6349 (match_operand:VFH 3 "nonimmediate_operand")]
6350 UNSPEC_FMADDSUB))]
6351 "TARGET_FMA || TARGET_FMA4 || (<MODE_SIZE> == 64 || TARGET_AVX512VL)")
6352
>
> Honza