Hi!

On Tue, Mar 27, 2018 at 09:30:35AM +0200, Uros Bizjak wrote:
> +(define_insn "*cmpdd_cmpo"
> +  [(set (match_operand:CCFP 0 "cc_reg_operand" "=y")
> + (compare:CCFP (match_operand:DD 1 "gpc_reg_operand" "d")
> +      (match_operand:DD 2 "gpc_reg_operand" "d")))
> +   (unspec [(match_dup 1) (match_dup 2)] UNSPEC_CMPO)]
> +  "TARGET_DFP"
> +  "dcmpo %0,%1,%2"
> +  [(set_attr "type" "dfp")])
> 
> I have had some problems when adding UNSPEC tags as a parallel to a
> compare for x86. For the testcase:
> 
> int testo (double a, double b)
> {
>   return a == b;
> }
> 
> middle end code emits sequence like:

[ snip ]

> and postreload pass removes (insn 10). This was not the case when the
> compare was implemented with a parallel.

For us this works fine:

        fcmpu 7,1,2
        mfcr 3,1
        rlwinm 3,3,31,1
        blr

(eq is not expanded as an ordered compare, only lt gt le ge are, not the
other twelve).

But say

int testo (double a, double b)
{
  if (a < b) return -1;
  if (a > b) return 1;
  return 0;
}

gives with -ffast-math

        fcmpu 7,1,2
        li 3,-1
        bltlr 7
        mfcr 3,1
        rlwinm 3,3,30,1
        blr

(the two compares were combined, by fwprop1) but without the flag we get

        fcmpo 5,1,2
        li 3,-1
        bltlr 5
        mfcr 3,4
        rlwinm 3,3,22,1
        fcmpo 7,1,2
        blr

(it's still combined, but the redundant compare isn't deleted).

> Also, -ffast-math on x86 emits trapping compares for all cases. For
> that reason, unordered (non-trapping) compares were wrapped in an
> unspec, with the expectation that -ffast-math can perform some more
> optimizations with patterns using naked compare RTX without unspec.

My patch expands with:

+         if (SCALAR_FLOAT_MODE_P (mode) && HONOR_NANS (mode)
+             && (code == LT || code == GT || code == LE || code == GE))
+           {
+             rtx unspec = gen_rtx_UNSPEC (VOIDmode, gen_rtvec (2, op0, op1),
+                                          UNSPEC_CMPO);
+             compare = gen_rtx_PARALLEL (VOIDmode,
+                                         gen_rtvec (2, compare, unspec));
+           }

so we use only unordered compares with -ffast-math (exactly as before
the patch, in all cases).

It would be ideal if there were two separate compare codes in RTL, or
some other way to flag it.  Or something that deletes unused ordered
compares (if they are expressed as a parallel with an unspec).

Are ordered compares faster than unordered on x86?  Strange stuff.


Segher

Reply via email to