Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

Jan Hubicka Thu, 24 Apr 2025 22:26:54 -0700

> On Thu, Apr 24, 2025 at 6:27 PM Jan Hubicka <[email protected]> wrote:
> >
> > > Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand
> > > or vpandn.
> > > Current register_operand/vector_operand could lose some optimization
> > > opportunity.
> > >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > >       * config/i386/predicates.md (vector_or_0_or_1s_operand): New 
> > > predicate.
> > >       (nonimm_or_0_or_1s_operand): Ditto.
> > >       * config/i386/sse.md (vcond_mask_<mode><sseintvecmodelower>):
> > >       Extend the predicate of operands1 to accept 0 or allones
> > >       operands.
> > >       (vcond_mask_<mode><sseintvecmodelower>): Ditto.
> > >       (vcond_mask_v1tiv1ti): Ditto.
> > >       (vcond_mask_<mode><sseintvecmodelower>): Ditto.
> > >       * config/i386/i386.md (mov<mode>cc): Ditto for operands[2] and
> > >       operands[3].
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >       * gcc.target/i386/blendv-to-maxmin.c: New test.
> > >       * gcc.target/i386/blendv-to-pand.c: New test.
> >
> > > diff --git a/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c 
> > > b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > > new file mode 100644
> > > index 00000000000..042eb7d8f24
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > > @@ -0,0 +1,12 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-march=x86-64-v3 -O2 -mfpmath=sse" } */
> > > +/* { dg-final { scan-assembler-times "vmaxsd" 1 } } */
> > > +
> > > +double
> > > +foo (double a)
> > > +{
> > > +  if (a > 0.0)
> > > +    return a;
> > > +  return 0.0;
> > > +}
> >
> > With -ffast-math this is matched as MAX_EXPR at gimple level. Without
> > -ffast-math we can not do that since MAX_EXPR (and RTL SMAX) are
> > explicitely documented as unspecified when one of parameters is nan.
> >
> > So without -ffast-math at combine time we see:
> > (insn 6 3 7 2 (set (reg:DF 103)
> >         (const_double:DF 0.0 [0x0.0p+0])) "e.c":7:1 169 {*movdf_internal}
> >      (nil))
> > (insn 7 6 12 2 (set (reg:DF 102 [ _2 ])
> >         (unspec:DF [
> >                 (reg:DF 104 [ a ])
> >                 (reg:DF 103)
> >             ] UNSPEC_IEEE_MAX)) "e.c":7:1 1825 {*ieee_smaxdf3}
> >      (expr_list:REG_DEAD (reg:DF 104 [ a ])
> >         (expr_list:REG_DEAD (reg:DF 103)
> >             (nil))))
> >
> > maxss is defined as:
> >
> > MAX(SRC1, SRC2)
> > {
> >     IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST := SRC2;
> >         ELSE IF (SRC1 = NaN) THEN DEST := SRC2; FI;
> >         ELSE IF (SRC2 = NaN) THEN DEST := SRC2; FI;
> >         ELSE IF (SRC1 > SRC2) THEN DEST := SRC1;
> >         ELSE DEST := SRC2;
> >     FI;
> > }
> 
> Please see [1], "Maximum and minimum functions", which says:
> 
> "The maxNum and minNum functions defined in the 2008 standard
> propagate a non-NaN when one input is NaN and the other input is a
> normal number.
> 
> This problem will be fixed by the forthcoming revision of the
> standard. The new functions named maximum and minimum are certain to
> propagate NaNs.
> Some current implementations are deviating from both of these
> definitions. Max and min instructions in the x86 instruction set are
> implemented so that max(a,b) and min(a,b) give b if one of the inputs
> is NaN. This is useful because it corresponds to the behavior of the
> code expression a > b ? a : b. A compiler can translate this common
> high-level language expression into a single instruction."
> 
> Unfortunately, SSE max and min instructions are incompatible with both
> standard revisions due to "ELSE IF (SRC1 = NaN) THEN DEST := SRC2;
> FI;"


I may be missing something obvious, but it still seems to me that RTL's

  (if_then_else:SF
     (gt (reg:SF SRC1) (reg:SF SRC2))
     (reg:SF SRC1)
     (reg:SF SRC2))

Behaves same as the sse's max spec:

MAX(SRC1, SRC2)
{
    IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST := SRC2;
        ELSE IF (SRC1 = NaN) THEN DEST := SRC2; FI;
        ELSE IF (SRC2 = NaN) THEN DEST := SRC2; FI;
        ELSE IF (SRC1 > SRC2) THEN DEST := SRC1;
        ELSE DEST := SRC2;
    FI;
}

1) If SRC1 and SRC2 are both 0 (possibly with different signs) then GT is false 
and we pick SRC2
2) If SRC1 or SRC2 is NaN then GT is false and we pick SRC2
3) if SRC1 > SRC2 then GT is true and we pick SRC1
otherwise GT is false and we pick SRC2

And thus it may be more RTL friendly to represent it this way instead of
current unspec called UNSPEC_IEEE_MAX....

Honza

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

Reply via email to