Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

Hongtao Liu Thu, 24 Apr 2025 23:11:15 -0700

On Fri, Apr 25, 2025 at 1:26 PM Jan Hubicka <[email protected]> wrote:
>
> > On Thu, Apr 24, 2025 at 6:27 PM Jan Hubicka <[email protected]> wrote:
> > >
> > > > Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand
> > > > or vpandn.
> > > > Current register_operand/vector_operand could lose some optimization
> > > > opportunity.
> > > >
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > > Ok for trunk?
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >       * config/i386/predicates.md (vector_or_0_or_1s_operand): New 
> > > > predicate.
> > > >       (nonimm_or_0_or_1s_operand): Ditto.
> > > >       * config/i386/sse.md (vcond_mask_<mode><sseintvecmodelower>):
> > > >       Extend the predicate of operands1 to accept 0 or allones
> > > >       operands.
> > > >       (vcond_mask_<mode><sseintvecmodelower>): Ditto.
> > > >       (vcond_mask_v1tiv1ti): Ditto.
> > > >       (vcond_mask_<mode><sseintvecmodelower>): Ditto.
> > > >       * config/i386/i386.md (mov<mode>cc): Ditto for operands[2] and
> > > >       operands[3].
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >       * gcc.target/i386/blendv-to-maxmin.c: New test.
> > > >       * gcc.target/i386/blendv-to-pand.c: New test.
> > >
> > > > diff --git a/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c 
> > > > b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > > > new file mode 100644
> > > > index 00000000000..042eb7d8f24
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> > > > @@ -0,0 +1,12 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-march=x86-64-v3 -O2 -mfpmath=sse" } */
> > > > +/* { dg-final { scan-assembler-times "vmaxsd" 1 } } */
> > > > +
> > > > +double
> > > > +foo (double a)
> > > > +{
> > > > +  if (a > 0.0)
> > > > +    return a;
> > > > +  return 0.0;
> > > > +}
> > >
> > > With -ffast-math this is matched as MAX_EXPR at gimple level. Without
> > > -ffast-math we can not do that since MAX_EXPR (and RTL SMAX) are
> > > explicitely documented as unspecified when one of parameters is nan.
> > >
> > > So without -ffast-math at combine time we see:
> > > (insn 6 3 7 2 (set (reg:DF 103)
> > >         (const_double:DF 0.0 [0x0.0p+0])) "e.c":7:1 169 {*movdf_internal}
> > >      (nil))
> > > (insn 7 6 12 2 (set (reg:DF 102 [ _2 ])
> > >         (unspec:DF [
> > >                 (reg:DF 104 [ a ])
> > >                 (reg:DF 103)
> > >             ] UNSPEC_IEEE_MAX)) "e.c":7:1 1825 {*ieee_smaxdf3}
> > >      (expr_list:REG_DEAD (reg:DF 104 [ a ])
> > >         (expr_list:REG_DEAD (reg:DF 103)
> > >             (nil))))
> > >
> > > maxss is defined as:
> > >
> > > MAX(SRC1, SRC2)
> > > {
> > >     IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST := SRC2;
> > >         ELSE IF (SRC1 = NaN) THEN DEST := SRC2; FI;
> > >         ELSE IF (SRC2 = NaN) THEN DEST := SRC2; FI;
> > >         ELSE IF (SRC1 > SRC2) THEN DEST := SRC1;
> > >         ELSE DEST := SRC2;
> > >     FI;
> > > }
> >
> > Please see [1], "Maximum and minimum functions", which says:
> >
> > "The maxNum and minNum functions defined in the 2008 standard
> > propagate a non-NaN when one input is NaN and the other input is a
> > normal number.
> >
> > This problem will be fixed by the forthcoming revision of the
> > standard. The new functions named maximum and minimum are certain to
> > propagate NaNs.
> > Some current implementations are deviating from both of these
> > definitions. Max and min instructions in the x86 instruction set are
> > implemented so that max(a,b) and min(a,b) give b if one of the inputs
> > is NaN. This is useful because it corresponds to the behavior of the
> > code expression a > b ? a : b. A compiler can translate this common
> > high-level language expression into a single instruction."
> >
> > Unfortunately, SSE max and min instructions are incompatible with both
> > standard revisions due to "ELSE IF (SRC1 = NaN) THEN DEST := SRC2;
> > FI;"
>
> I may be missing something obvious, but it still seems to me that RTL's
>
>   (if_then_else:SF
>      (gt (reg:SF SRC1) (reg:SF SRC2))
>      (reg:SF SRC1)
>      (reg:SF SRC2))
>
> Behaves same as the sse's max spec:
>
> MAX(SRC1, SRC2)
> {
>     IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST := SRC2;
>         ELSE IF (SRC1 = NaN) THEN DEST := SRC2; FI;
>         ELSE IF (SRC2 = NaN) THEN DEST := SRC2; FI;
>         ELSE IF (SRC1 > SRC2) THEN DEST := SRC1;
>         ELSE DEST := SRC2;
>     FI;
> }
>
> 1) If SRC1 and SRC2 are both 0 (possibly with different signs) then GT is 
> false and we pick SRC2
> 2) If SRC1 or SRC2 is NaN then GT is false and we pick SRC2
> 3) if SRC1 > SRC2 then GT is true and we pick SRC1
> otherwise GT is false and we pick SRC2
>
> And thus it may be more RTL friendly to represent it this way instead of
> current unspec called UNSPEC_IEEE_MAX...


There's a patch proposed for that [1], and Jakub has some comments.

Jakub Jelinek <[email protected]> 于2024年11月15日周五 16:20写道：
>
> On Fri, Nov 15, 2024 at 04:04:55PM +0800, Hongyu Wang wrote:
> > Following the discussion in pr116738, the insn for UNSPEC_IEEE_MAXMIN
> > actually matches the behavior of if_then_else, so remove the UNSPEC and
> > rewrite related pattern with if_then_else.
>
> I'm not sure if it is necessarily always a win.
> It has an advantage that simplify-rtx can simplify it if the operands are
> constant and better describes what's going on.
> The disadvantage I see is that it has multiple uses of the operands, each
> operand is used twice rather than once in the pattern, which could lead to
> combiner or other optimizations giving up on it.
>
> So, do you see any code generation changes with this patch on something
> larger?

with Hongyu's replied
---
I ran spec2017 with O2 config and found some bad changes such as vmaxsd
does not use mem operands. The reload creates extra load for the new pattern.
So maybe we need to preserve current UNSPEC.
---

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668892.html
>
> Honza



-- 
BR,
Hongtao

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

Reply via email to