Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

Uros Bizjak Thu, 24 Apr 2025 11:32:10 -0700

On Thu, Apr 24, 2025 at 8:10 PM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Thu, Apr 24, 2025 at 6:27 PM Jan Hubicka <hubi...@ucw.cz> wrote:
> >
> > > Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand
> > > or vpandn.
> > > Current register_operand/vector_operand could lose some optimization
> > > opportunity.
> > >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > >       * config/i386/predicates.md (vector_or_0_or_1s_operand): New 
> > > predicate.
> > >       (nonimm_or_0_or_1s_operand): Ditto.
> > >       * config/i386/sse.md (vcond_mask_<mode><sseintvecmodelower>):
> > >       Extend the predicate of operands1 to accept 0 or allones
> > >       operands.
> > >       (vcond_mask_<mode><sseintvecmodelower>): Ditto.
> > >       (vcond_mask_v1tiv1ti): Ditto.
> > >       (vcond_mask_<mode><sseintvecmodelower>): Ditto.
> > >       * config/i386/i386.md (mov<mode>cc): Ditto for operands[2] and
> > >       operands[3].
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >       * gcc.target/i386/blendv-to-maxmin.c: New test.
> > >       * gcc.target/i386/blendv-to-pand.c: New test.
> >
> > > diff --git a/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c 
> > > b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > > new file mode 100644
> > > index 00000000000..042eb7d8f24
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > > @@ -0,0 +1,12 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-march=x86-64-v3 -O2 -mfpmath=sse" } */
> > > +/* { dg-final { scan-assembler-times "vmaxsd" 1 } } */
> > > +
> > > +double
> > > +foo (double a)
> > > +{
> > > +  if (a > 0.0)
> > > +    return a;
> > > +  return 0.0;
> > > +}
> >
> > With -ffast-math this is matched as MAX_EXPR at gimple level. Without
> > -ffast-math we can not do that since MAX_EXPR (and RTL SMAX) are
> > explicitely documented as unspecified when one of parameters is nan.
> >
> > So without -ffast-math at combine time we see:
> > (insn 6 3 7 2 (set (reg:DF 103)
> >         (const_double:DF 0.0 [0x0.0p+0])) "e.c":7:1 169 {*movdf_internal}
> >      (nil))
> > (insn 7 6 12 2 (set (reg:DF 102 [ _2 ])
> >         (unspec:DF [
> >                 (reg:DF 104 [ a ])
> >                 (reg:DF 103)
> >             ] UNSPEC_IEEE_MAX)) "e.c":7:1 1825 {*ieee_smaxdf3}
> >      (expr_list:REG_DEAD (reg:DF 104 [ a ])
> >         (expr_list:REG_DEAD (reg:DF 103)
> >             (nil))))
> >
> > maxss is defined as:
> >
> > MAX(SRC1, SRC2)
> > {
> >     IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST := SRC2;
> >         ELSE IF (SRC1 = NaN) THEN DEST := SRC2; FI;
> >         ELSE IF (SRC2 = NaN) THEN DEST := SRC2; FI;
> >         ELSE IF (SRC1 > SRC2) THEN DEST := SRC1;
> >         ELSE DEST := SRC2;
> >     FI;
> > }
>
> Please see [1], "Maximum and minimum functions", which says:
>
> "The maxNum and minNum functions defined in the 2008 standard
> propagate a non-NaN when one input is NaN and the other input is a
> normal number.
>
> This problem will be fixed by the forthcoming revision of the
> standard. The new functions named maximum and minimum are certain to
> propagate NaNs.
> Some current implementations are deviating from both of these
> definitions. Max and min instructions in the x86 instruction set are
> implemented so that max(a,b) and min(a,b) give b if one of the inputs
> is NaN. This is useful because it corresponds to the behavior of the
> code expression a > b ? a : b. A compiler can translate this common
> high-level language expression into a single instruction."
>
> Unfortunately, SSE max and min instructions are incompatible with both
> standard revisions due to "ELSE IF (SRC1 = NaN) THEN DEST := SRC2;
> FI;"


Ehm, SSE max and min instructions are incompatible with -2019 because
of "ELSE IF (SRC1 = NaN) THEN DEST := SRC2; FI;" and with -2008
because of "ELSE IF (SRC2 = NaN) THEN DEST := SRC2; FI;".

Uros.

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

Reply via email to