work204-bugs)] Fix PR 118541, do not generate unordered fp cmoves for IEEE compares.

Michael Meissner via Gcc-cvs Tue, 13 May 2025 20:01:18 -0700

https://gcc.gnu.org/g:f00b9ee35f2190626fa3cfa8c4fc729714710895


commit f00b9ee35f2190626fa3cfa8c4fc729714710895
Author: Michael Meissner <meiss...@linux.ibm.com>
Date:   Tue May 13 23:00:48 2025 -0400

    Fix PR 118541, do not generate unordered fp cmoves for IEEE compares.
    
    In bug PR target/118541 on power9, power10, and power11 systems, for the
    function:
    
            extern double __ieee754_acos (double);
    
            double
            __acospi (double x)
            {
              double ret = __ieee754_acos (x) / 3.14;
              return __builtin_isgreater (ret, 1.0) ? 1.0 : ret;
            }
    
    GCC currently generates the following code:
    
            Power9                          Power10 and Power11
            ======                          ===================
            bl __ieee754_acos               bl __ieee754_acos@notoc
            nop                             plfd 0,.LC0@pcrel
            addis 9,2,.LC2@toc@ha           xxspltidp 12,1065353216
            addi 1,1,32                     addi 1,1,32
            lfd 0,.LC2@toc@l(9)             ld 0,16(1)
            addis 9,2,.LC0@toc@ha           fdiv 0,1,0
            ld 0,16(1)                      mtlr 0
            lfd 12,.LC0@toc@l(9)            xscmpgtdp 1,0,12
            fdiv 0,1,0                      xxsel 1,0,12,1
            mtlr 0                          blr
            xscmpgtdp 1,0,12
            xxsel 1,0,12,1
            blr
    
    This is because ifcvt.c optimizes the conditional floating point move to 
use the
    XSCMPGTDP instruction.
    
    However, the XSCMPGTDP instruction will generate an interrupt if one of the
    arguments is a signalling NaN and signalling NaNs can generate an interrupt.
    The IEEE comparison functions (isgreater, etc.) require that the comparison 
not
    raise an interrupt.
    
    The following patch changes the PowerPC back end so that ifcvt.c will not 
change
    the if/then test and move into a conditional move if the comparison is one 
of
    the comparisons that do not raise an error with signalling NaNs and -Ofast 
is
    not used.  If a normal comparison is used, GCC will continue to generate
    XSCMPGTDP and XXSEL.
    
    In the previous version of the patch if -Ofast was used, isgreater and other
    IEEE comparison functions would generate the XSCMPGTDP and XSCMPGEDP
    instructions.  This version of the patch just removes trying to invert using
    inverted fpmasks at all, and if UN{GT,GE,LT,LE} are generated, it will 
always
    fall back to doing a comparison and a jump.
    
    However if -ffinite-math-only (or -Ofast or -ffast-math) is used, the 
machine
    independent portion of the compiler will convert UNGT into GT, and in that 
case
    we will generate XSCMPGTDP, etc.
    
    For the following code:
    
            double ordered_compare (double a, double b, double c, double d)
            {
              return __builtin_isgreater (a, b) ? c : d;
            }
    
            /* Verify normal > does generate xscmpgtdp.  */
    
            double
            normal_compare (double a, double b, double c, double d)
            {
              return a > b ? c : d;
            }
    
    with the following patch, GCC generates the following for power9, power10, 
and
    power11:
    
            ordered_compare:
                    fcmpu 0,1,2
                    fmr 1,4
                    bnglr 0
                    fmr 1,3
                    blr
    
            normal_compare:
                    xscmpgtdp 1,1,2
                    xxsel 1,4,3,1
                    blr
    
    I have built bootstrap compilers on big endian power9 systems and little 
endian
    power9/power10 systems and there were no regressions.  Can I check this 
patch
    into the GCC trunk, and after a waiting period, can I check this into the 
active
    older branches?
    
    2025-05-13  Michael Meissner  <meiss...@linux.ibm.com>
    
    gcc/
    
            PR target/118541
            * config/rs6000/predicates.md (invert_fpmask_comparison_operator):
            Delete predicate, since it was only used to reverse a fpmask 
comparison
            that generates XSCMP{EQ,GT,GE}DP instruction for ordered compares.
            Because those instructions can trap on signalling NaNs, we can't
            generate the code for ordered compares.
            * config/rs6000/rs6000-protos.h (enum rev_cond_ordered): New 
enumeration.
            (rs6000_reverse_condition): Add argument.
            * config/rs6000/rs6000.cc (rs6000_reverse_condition): Do not allow
            ordered comparisons to be reversed for floating point conditional 
moves,
            but allow ordered comparisons to be reversed on jumps.
            (rs6000_emit_sCOND): Adjust rs6000_reverse_condition call.
            * config/rs6000/rs6000.h (REVERSE_CONDITION): Likewise.
            * config/rs6000/rs6000.md (mov<SFDF:mode><SFDF2:mode>cc_invert_p):
            Delete insn, since it was only used to reverse a fpmask comparison
            that generates XSCMP{EQ,GT,GE}DP instruction for ordered compares.
            Because those instructions can trap on signalling NaNs, we can't
            generate the code for ordered compares.
            (mov<mode>cc_invert_p10): Likewise.
            (reverse_branch_comparison): Name insn.  Adjust 
rs6000_reverse_condition
            calls.

Diff:
---
 gcc/config/rs6000/predicates.md             |   6 --
 gcc/config/rs6000/rs6000-protos.h           |  17 +++-
 gcc/config/rs6000/rs6000.cc                 |  34 +++++--
 gcc/config/rs6000/rs6000.h                  |  10 +-
 gcc/config/rs6000/rs6000.md                 |  99 ++++---------------
 gcc/testsuite/gcc.target/powerpc/pr118541.c | 141 ++++++++++++++++++++++++++++
 6 files changed, 206 insertions(+), 101 deletions(-)

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 647e89afb6a7..02ba8fa6c9b0 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1463,12 +1463,6 @@
 (define_predicate "fpmask_comparison_operator"
   (match_code "eq,gt,ge"))
 
-;; Return 1 if OP is a comparison operator suitable for vector/scalar
-;; comparisons that generate a 0/-1 mask (i.e. the inverse of
-;; fpmask_comparison_operator).
-(define_predicate "invert_fpmask_comparison_operator"
-  (match_code "ne,unlt,unle"))
-
 ;; Return 1 if OP is a comparison operation suitable for integer vector/scalar
 ;; comparisons that generate a -1/0 mask.
 (define_predicate "vecint_comparison_operator"
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 4619142d197b..5beb44fc339b 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -114,8 +114,23 @@ extern const char *rs6000_sibcall_template (rtx *, 
unsigned int);
 extern const char *rs6000_indirect_call_template (rtx *, unsigned int);
 extern const char *rs6000_indirect_sibcall_template (rtx *, unsigned int);
 extern const char *rs6000_pltseq_template (rtx *, int);
+
+/* Whether we can reverse the sense of an ordered (UNLT, UNLE, UNGT, UNGE,
+   UNEQ, or LTGT) comparison.  If we are doing floating point conditional moves
+   on power9 and above, we cannot convert an ordered comparison to unordered,
+   since the instructions (XSCMP{EQ,GT,GE}DP) that are used for conditional
+   moves can trap if an argument is a signalling NaN.  However for normal jumps
+   we can reverse a comparison since we only use unordered compare instructions
+   which do not trap on signalling NaNs.  */
+
+enum class rev_cond_ordered {
+  ordered_ok,
+  no_ordered
+};
+
 extern enum rtx_code rs6000_reverse_condition (machine_mode,
-                                              enum rtx_code);
+                                              enum rtx_code,
+                                              enum rev_cond_ordered);
 extern rtx rs6000_emit_eqne (machine_mode, rtx, rtx, rtx);
 extern rtx rs6000_emit_fp_cror (rtx_code, machine_mode, rtx);
 extern void rs6000_emit_sCOND (machine_mode, rtx[]);
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 74709bb4bc19..ad5bbe9af5b0 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15367,17 +15367,30 @@ rs6000_print_patchable_function_entry (FILE *file,
 }
 
 enum rtx_code
-rs6000_reverse_condition (machine_mode mode, enum rtx_code code)
+rs6000_reverse_condition (machine_mode mode,
+                         enum rtx_code code,
+                         enum rev_cond_ordered ordered_cmp_ok)
 {
   /* Reversal of FP compares takes care -- an ordered compare
-     becomes an unordered compare and vice versa.  */
+     becomes an unordered compare and vice versa.
+
+     However, this is not safe for ordered comparisons (i.e. for isgreater,
+     etc.)  starting with the power9 because ifcvt.cc will want to create a fp
+     cmove, and the x{s,v}cmp{eq,gt,ge}{dp,qp} instructions will trap if one of
+     the arguments is a signalling NaN.  */
+
   if (mode == CCFPmode
-      && (!flag_finite_math_only
-         || code == UNLT || code == UNLE || code == UNGT || code == UNGE
+      && (code == UNLT || code == UNLE || code == UNGT || code == UNGE
          || code == UNEQ || code == LTGT))
-    return reverse_condition_maybe_unordered (code);
-  else
-    return reverse_condition (code);
+    {
+      if (ordered_cmp_ok == rev_cond_ordered::no_ordered
+         && !flag_finite_math_only)
+       return UNKNOWN;
+
+      return reverse_condition_maybe_unordered (code);
+    }
+
+  return reverse_condition (code);
 }
 
 /* Check if C (as 64bit integer) can be rotated to a constant which constains
@@ -15987,11 +16000,14 @@ rs6000_emit_sCOND (machine_mode mode, rtx operands[])
       rtx not_result = gen_reg_rtx (CCEQmode);
       rtx not_op, rev_cond_rtx;
       machine_mode cc_mode;
+      rtx_code rev;
 
       cc_mode = GET_MODE (XEXP (condition_rtx, 0));
 
-      rev_cond_rtx = gen_rtx_fmt_ee (rs6000_reverse_condition (cc_mode, 
cond_code),
-                                    SImode, XEXP (condition_rtx, 0), 
const0_rtx);
+      rev = rs6000_reverse_condition (cc_mode, cond_code,
+                                     rev_cond_ordered::ordered_ok);
+      rev_cond_rtx = gen_rtx_fmt_ee (rev, SImode, XEXP (condition_rtx, 0),
+                                    const0_rtx);
       not_op = gen_rtx_COMPARE (CCEQmode, rev_cond_rtx, const0_rtx);
       emit_insn (gen_rtx_SET (not_result, not_op));
       condition_rtx = gen_rtx_EQ (VOIDmode, not_result, const0_rtx);
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 9267612fbc9c..cf51d904c5e4 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -1812,11 +1812,17 @@ extern scalar_int_mode rs6000_pmode;
 
 /* Can the condition code MODE be safely reversed?  This is safe in
    all cases on this port, because at present it doesn't use the
-   trapping FP comparisons (fcmpo).  */
+   trapping FP comparisons (fcmpo).
+
+   However, this is not safe for ordered comparisons (i.e. for isgreater, etc.)
+   starting with the power9 because ifcvt.cc will want to create a fp cmove,
+   and the x{s,v}cmp{eq,gt,ge}{dp,qp} instructions will trap if one of the
+   arguments is a signalling NaN.  */
 #define REVERSIBLE_CC_MODE(MODE) 1
 
 /* Given a condition code and a mode, return the inverse condition.  */
-#define REVERSE_CONDITION(CODE, MODE) rs6000_reverse_condition (MODE, CODE)
+#define REVERSE_CONDITION(CODE, MODE)                                  \
+  rs6000_reverse_condition (MODE, CODE, rev_cond_ordered::no_ordered)
 
 
 /* Target cpu costs.  */
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 65da0c653304..d9c9db5f1919 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -5730,43 +5730,6 @@
  [(set_attr "length" "8")
   (set_attr "type" "vecperm")])
 
-;; Handle inverting the fpmask comparisons.
-(define_insn_and_split "*mov<SFDF:mode><SFDF2:mode>cc_invert_p9"
-  [(set (match_operand:SFDF 0 "vsx_register_operand" "=&wa,wa")
-       (if_then_else:SFDF
-        (match_operator:CCFP 1 "invert_fpmask_comparison_operator"
-               [(match_operand:SFDF2 2 "vsx_register_operand" "wa,wa")
-                (match_operand:SFDF2 3 "vsx_register_operand" "wa,wa")])
-        (match_operand:SFDF 4 "vsx_register_operand" "wa,wa")
-        (match_operand:SFDF 5 "vsx_register_operand" "wa,wa")))
-   (clobber (match_scratch:V2DI 6 "=0,&wa"))]
-  "TARGET_P9_MINMAX"
-  "#"
-  "&& 1"
-  [(set (match_dup 6)
-       (if_then_else:V2DI (match_dup 9)
-                          (match_dup 7)
-                          (match_dup 8)))
-   (set (match_dup 0)
-       (if_then_else:SFDF (ne (match_dup 6)
-                              (match_dup 8))
-                          (match_dup 5)
-                          (match_dup 4)))]
-{
-  rtx op1 = operands[1];
-  enum rtx_code cond = reverse_condition_maybe_unordered (GET_CODE (op1));
-
-  if (GET_CODE (operands[6]) == SCRATCH)
-    operands[6] = gen_reg_rtx (V2DImode);
-
-  operands[7] = CONSTM1_RTX (V2DImode);
-  operands[8] = CONST0_RTX (V2DImode);
-
-  operands[9] = gen_rtx_fmt_ee (cond, CCFPmode, operands[2], operands[3]);
-}
- [(set_attr "length" "8")
-  (set_attr "type" "vecperm")])
-
 (define_insn "*fpmask<mode>"
   [(set (match_operand:V2DI 0 "vsx_register_operand" "=wa")
        (if_then_else:V2DI
@@ -5835,43 +5798,6 @@
  [(set_attr "length" "8")
   (set_attr "type" "vecperm")])
 
-;; Handle inverting the fpmask comparisons.
-(define_insn_and_split "*mov<mode>cc_invert_p10"
-  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=&v,v")
-       (if_then_else:IEEE128
-        (match_operator:CCFP 1 "invert_fpmask_comparison_operator"
-               [(match_operand:IEEE128 2 "altivec_register_operand" "v,v")
-                (match_operand:IEEE128 3 "altivec_register_operand" "v,v")])
-        (match_operand:IEEE128 4 "altivec_register_operand" "v,v")
-        (match_operand:IEEE128 5 "altivec_register_operand" "v,v")))
-   (clobber (match_scratch:V2DI 6 "=0,&v"))]
-  "TARGET_POWER10 && TARGET_FLOAT128_HW"
-  "#"
-  "&& 1"
-  [(set (match_dup 6)
-       (if_then_else:V2DI (match_dup 9)
-                          (match_dup 7)
-                          (match_dup 8)))
-   (set (match_dup 0)
-       (if_then_else:IEEE128 (ne (match_dup 6)
-                                 (match_dup 8))
-                             (match_dup 5)
-                             (match_dup 4)))]
-{
-  rtx op1 = operands[1];
-  enum rtx_code cond = reverse_condition_maybe_unordered (GET_CODE (op1));
-
-  if (GET_CODE (operands[6]) == SCRATCH)
-    operands[6] = gen_reg_rtx (V2DImode);
-
-  operands[7] = CONSTM1_RTX (V2DImode);
-  operands[8] = CONST0_RTX (V2DImode);
-
-  operands[9] = gen_rtx_fmt_ee (cond, CCFPmode, operands[2], operands[3]);
-}
- [(set_attr "length" "8")
-  (set_attr "type" "vecperm")])
-
 (define_insn "*fpmask<mode>"
   [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
        (if_then_else:V2DI
@@ -13497,7 +13423,7 @@
 ;; If we are comparing the result of two comparisons, this can be done
 ;; using creqv or crxor.
 
-(define_insn_and_split ""
+(define_insn_and_split "*reverse_branch_comparison"
   [(set (match_operand:CCEQ 0 "cc_reg_operand" "=y")
        (compare:CCEQ (match_operator 1 "branch_comparison_operator"
                              [(match_operand 2 "cc_reg_operand" "y")
@@ -13512,6 +13438,7 @@
                                    (match_dup 5)))]
 {
   int positive_1, positive_2;
+  enum rev_cond_ordered order_ok = rev_cond_ordered::ordered_ok;
 
   positive_1 = branch_positive_comparison_operator (operands[1],
                                                    GET_MODE (operands[1]));
@@ -13519,19 +13446,25 @@
                                                    GET_MODE (operands[3]));
 
   if (! positive_1)
-    operands[1] = gen_rtx_fmt_ee (rs6000_reverse_condition (GET_MODE 
(operands[2]),
-                                                           GET_CODE 
(operands[1])),
-                                 SImode,
-                                 operands[2], const0_rtx);
+    {
+      rtx_code rev = rs6000_reverse_condition (GET_MODE (operands[2]),
+                                              GET_CODE (operands[1]),
+                                              order_ok);
+      gcc_assert (rev != UNKNOWN);
+      operands[1] = gen_rtx_fmt_ee (rev, SImode, operands[2], const0_rtx);
+    }
   else if (GET_MODE (operands[1]) != SImode)
     operands[1] = gen_rtx_fmt_ee (GET_CODE (operands[1]), SImode,
                                  operands[2], const0_rtx);
 
   if (! positive_2)
-    operands[3] = gen_rtx_fmt_ee (rs6000_reverse_condition (GET_MODE 
(operands[4]),
-                                                           GET_CODE 
(operands[3])),
-                                 SImode,
-                                 operands[4], const0_rtx);
+    {
+      rtx_code rev = rs6000_reverse_condition (GET_MODE (operands[4]),
+                                              GET_CODE (operands[3]),
+                                              order_ok);
+      gcc_assert (rev != UNKNOWN);
+      operands[3] = gen_rtx_fmt_ee (rev, SImode, operands[4], const0_rtx);
+    }
   else if (GET_MODE (operands[3]) != SImode)
     operands[3] = gen_rtx_fmt_ee (GET_CODE (operands[3]), SImode,
                                  operands[4], const0_rtx);
diff --git a/gcc/testsuite/gcc.target/powerpc/pr118541.c 
b/gcc/testsuite/gcc.target/powerpc/pr118541.c
new file mode 100644
index 000000000000..80f1b2d2d83d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr118541.c
@@ -0,0 +1,141 @@
+/* { dg-do compile } */
+/* { dg-options "-mdejagnu-cpu=power9 -O2" } */
+/* { dg-require-effective-target powerpc_vsx } */
+
+/* PR target/118541 says that the ordered comparison functions like isgreater
+   should not optimize floating point conditional moves to use
+   x{s,v}cmp{eq,gt,ge}{dp,qp} and xxsel since that instruction can cause traps
+   if one of the arguments is a signaling NaN.  */
+
+/* Verify isgreater, isgreaterequal, isless, and islessequal does not generate
+   xscmpltdp, xscmpgtdp, or xscmpeqdp.  */
+
+double
+ordered_gt (double a, double b, double c, double d)
+{
+  /*
+   * fcmpu 0,1,2
+   * fmr 1,4
+   * bnglr 0
+   * fmr 1,3
+   * blr
+   */
+
+  return __builtin_isgreater (a, b) ? c : d;
+}
+
+double
+ordered_ge (double a, double b, double c, double d)
+{
+  /*
+   * fcmpu 0,1,2
+   * fmr 1,4
+   * cror 2,0,3
+   * beqlr 0
+   * fmr 1,3
+   * blr
+   */
+
+  return __builtin_isgreaterequal (a, b) ? c : d;
+}
+
+double
+ordered_lt (double a, double b, double c, double d)
+{
+  /*
+   * fcmpu 0,1,2
+   * fmr 1,4
+   * bnllr 0
+   * fmr 1,3
+   * blr
+   */
+
+  return __builtin_isless (a, b) ? c : d;
+}
+
+double
+ordered_le (double a, double b, double c, double d)
+{
+  /*
+   * fcmpu 0,1,2
+   * fmr 1,4
+   * cror 2,1,3
+   * beqlr 0
+   * fmr 1,3
+   * blr
+   */
+
+  return __builtin_islessequal (a, b) ? c : d;
+}
+
+double
+normal_gt (double a, double b, double c, double d)
+{
+  /*
+   * xscmpgtdp 1,1,2
+   * xxsel 1,4,3,1
+   * blr
+   */
+
+  return a > b ? c : d;
+}
+
+double
+normal_ge (double a, double b, double c, double d)
+{
+  /*
+   * xscmpgedp 1,1,2
+   * xxsel 1,4,3,1
+   * blr
+   */
+
+  return a >= b ? c : d;
+}
+
+double
+normal_lt (double a, double b, double c, double d)
+{
+  /*
+   * xscmpgtdp 1,2,1
+   * xxsel 1,4,3,1
+   * blr
+   */
+
+  return a < b ? c : d;
+}
+
+double
+normal_le (double a, double b, double c, double d)
+{
+  /*
+   * xscmpgedp 1,2,1
+   * xxsel 1,4,3,1
+   * blr
+   */
+
+  return a <= b ? c : d;
+}
+
+double
+normal_eq (double a, double b, double c, double d)
+{
+  /*
+   * xscmpeqdp 1,1,2
+   * xxsel 1,4,3,1
+   * blr
+   */
+
+  return a == b ? c : d;
+}
+
+double
+normal_ne (double a, double b, double c, double d)
+{
+  /*
+   * xscmpeqdp 1,1,2
+   * xxsel 1,3,4,1
+   * blr
+   */
+
+  return a != b ? c : d;
+}

[gcc(refs/users/meissner/heads/work204-bugs)] Fix PR 118541, do not generate unordered fp cmoves for IEEE compares.

Reply via email to