> Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand
> or vpandn.
> Current register_operand/vector_operand could lose some optimization
> opportunity.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
> 
> gcc/ChangeLog:
> 
>       * config/i386/predicates.md (vector_or_0_or_1s_operand): New predicate.
>       (nonimm_or_0_or_1s_operand): Ditto.
>       * config/i386/sse.md (vcond_mask_<mode><sseintvecmodelower>):
>       Extend the predicate of operands1 to accept 0 or allones
>       operands.
>       (vcond_mask_<mode><sseintvecmodelower>): Ditto.
>       (vcond_mask_v1tiv1ti): Ditto.
>       (vcond_mask_<mode><sseintvecmodelower>): Ditto.
>       * config/i386/i386.md (mov<mode>cc): Ditto for operands[2] and
>       operands[3].
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.target/i386/blendv-to-maxmin.c: New test.
>       * gcc.target/i386/blendv-to-pand.c: New test.

> diff --git a/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c 
> b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> new file mode 100644
> index 00000000000..042eb7d8f24
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/blendv-to-maxmin.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=x86-64-v3 -O2 -mfpmath=sse" } */
> +/* { dg-final { scan-assembler-times "vmaxsd" 1 } } */
> +
> +double
> +foo (double a)
> +{
> +  if (a > 0.0)
> +    return a;
> +  return 0.0;
> +}

With -ffast-math this is matched as MAX_EXPR at gimple level. Without
-ffast-math we can not do that since MAX_EXPR (and RTL SMAX) are
explicitely documented as unspecified when one of parameters is nan.

So without -ffast-math at combine time we see:
(insn 6 3 7 2 (set (reg:DF 103)
        (const_double:DF 0.0 [0x0.0p+0])) "e.c":7:1 169 {*movdf_internal}
     (nil))
(insn 7 6 12 2 (set (reg:DF 102 [ _2 ])
        (unspec:DF [
                (reg:DF 104 [ a ])
                (reg:DF 103)
            ] UNSPEC_IEEE_MAX)) "e.c":7:1 1825 {*ieee_smaxdf3}
     (expr_list:REG_DEAD (reg:DF 104 [ a ])
        (expr_list:REG_DEAD (reg:DF 103)
            (nil))))

maxss is defined as:

MAX(SRC1, SRC2)
{
    IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST := SRC2;
        ELSE IF (SRC1 = NaN) THEN DEST := SRC2; FI;
        ELSE IF (SRC2 = NaN) THEN DEST := SRC2; FI;
        ELSE IF (SRC1 > SRC2) THEN DEST := SRC1;
        ELSE DEST := SRC2;
    FI;
}

which I think translates to
  SRC1 > SRC1 : SRC1 : SRC2

If SRC1 and SRC2 are both 0, this should evaulate to false and return RC2
if one of them is NaN this should evaulate to false and return SRC2

so it seems to do right side cases and has direct RTL equivalent.  So
why we need UNSPEC_IEEE_MAX at all? Expressing this in RTL directly
would enable RTL passes to do better job.
Similarly for BLENDV...

Honza

Reply via email to