Hi! On Tue, Mar 27, 2018 at 09:30:35AM +0200, Uros Bizjak wrote: > +(define_insn "*cmpdd_cmpo" > + [(set (match_operand:CCFP 0 "cc_reg_operand" "=y") > + (compare:CCFP (match_operand:DD 1 "gpc_reg_operand" "d") > + (match_operand:DD 2 "gpc_reg_operand" "d"))) > + (unspec [(match_dup 1) (match_dup 2)] UNSPEC_CMPO)] > + "TARGET_DFP" > + "dcmpo %0,%1,%2" > + [(set_attr "type" "dfp")]) > > I have had some problems when adding UNSPEC tags as a parallel to a > compare for x86. For the testcase: > > int testo (double a, double b) > { > return a == b; > } > > middle end code emits sequence like:
[ snip ] > and postreload pass removes (insn 10). This was not the case when the > compare was implemented with a parallel. For us this works fine: fcmpu 7,1,2 mfcr 3,1 rlwinm 3,3,31,1 blr (eq is not expanded as an ordered compare, only lt gt le ge are, not the other twelve). But say int testo (double a, double b) { if (a < b) return -1; if (a > b) return 1; return 0; } gives with -ffast-math fcmpu 7,1,2 li 3,-1 bltlr 7 mfcr 3,1 rlwinm 3,3,30,1 blr (the two compares were combined, by fwprop1) but without the flag we get fcmpo 5,1,2 li 3,-1 bltlr 5 mfcr 3,4 rlwinm 3,3,22,1 fcmpo 7,1,2 blr (it's still combined, but the redundant compare isn't deleted). > Also, -ffast-math on x86 emits trapping compares for all cases. For > that reason, unordered (non-trapping) compares were wrapped in an > unspec, with the expectation that -ffast-math can perform some more > optimizations with patterns using naked compare RTX without unspec. My patch expands with: + if (SCALAR_FLOAT_MODE_P (mode) && HONOR_NANS (mode) + && (code == LT || code == GT || code == LE || code == GE)) + { + rtx unspec = gen_rtx_UNSPEC (VOIDmode, gen_rtvec (2, op0, op1), + UNSPEC_CMPO); + compare = gen_rtx_PARALLEL (VOIDmode, + gen_rtvec (2, compare, unspec)); + } so we use only unordered compares with -ffast-math (exactly as before the patch, in all cases). It would be ideal if there were two separate compare codes in RTL, or some other way to flag it. Or something that deletes unused ordered compares (if they are expressed as a parallel with an unspec). Are ordered compares faster than unordered on x86? Strange stuff. Segher