[PATCH, V4] PR target/118541 - Do not generate unordered fp cmoves for IEEE compares on PowerPC

Michael Meissner Wed, 26 Mar 2025 18:56:08 -0700

This is version 4 of the patch for PR target/118541.

In version 4, I made the following changes:


    1:  I changed the use of enums to match current C++.  Thanks to Florian
    Weimer.

In version 3, I made the following changes:

    1:  The new argument to rs6000_reverse_condition that says whether we should
        allow ordered floating point compares to be reversed is now an
        enumeration instead of a boolean.

    2:  I tried to make the code in rs6000_reverse_condition clearer.

    3:  I added checks in invert_fpmask_comparison_operator to prevent ordered
        floating point compares from being reversed unless -ffast-math.

    4:  I split the test cases into 4 separate tests (ordered vs. unordered
        compare and -O2 vs. -Ofast).

In bug PR target/118541 on power9, power10, and power11 systems, for the
function:

        extern double __ieee754_acos (double);

        double
        __acospi (double x)
        {
          double ret = __ieee754_acos (x) / 3.14;
          return __builtin_isgreater (ret, 1.0) ? 1.0 : ret;
        }

GCC currently generates the following code:

        Power9                          Power10 and Power11
        ======                          ===================
        bl __ieee754_acos               bl __ieee754_acos@notoc
        nop                             plfd 0,.LC0@pcrel
        addis 9,2,.LC2@toc@ha           xxspltidp 12,1065353216
        addi 1,1,32                     addi 1,1,32
        lfd 0,.LC2@toc@l(9)             ld 0,16(1)
        addis 9,2,.LC0@toc@ha           fdiv 0,1,0
        ld 0,16(1)                      mtlr 0
        lfd 12,.LC0@toc@l(9)            xscmpgtdp 1,0,12
        fdiv 0,1,0                      xxsel 1,0,12,1
        mtlr 0                          blr
        xscmpgtdp 1,0,12
        xxsel 1,0,12,1
        blr

This is because ifcvt.c optimizes the conditional floating point move to use the
XSCMPGTDP instruction.

However, the XSCMPGTDP instruction will generate an interrupt if one of the
arguments is a signalling NaN and signalling NaNs can generate an interrupt.
The IEEE comparison functions (isgreater, etc.) require that the comparison not
raise an interrupt.

The following patch changes the PowerPC back end so that ifcvt.c will not change
the if/then test and move into a conditional move if the comparison is one of
the comparisons that do not raise an error with signalling NaNs and -Ofast is
not used.  If a normal comparison is used or -Ofast is used, GCC will continue
to generate XSCMPGTDP and XXSEL.

For the following code:

        double
        ordered_compare (double a, double b, double c, double d)
        {
          return __builtin_isgreater (a, b) ? c : d;
        }

        /* Verify normal > does generate xscmpgtdp.  */

        double
        normal_compare (double a, double b, double c, double d)
        {
          return a > b ? c : d;
        }

with the following patch, GCC generates the following for power9, power10, and
power11:

        ordered_compare:
                fcmpu 0,1,2
                fmr 1,4
                bnglr 0
                fmr 1,3
                blr

        normal_compare:
                xscmpgtdp 1,1,2
                xxsel 1,4,3,1
                blr

I have built bootstrap compilers on big endian power9 systems and little endian
power9/power10 systems and there were no regressions.  Can I check this patch
into the GCC trunk, and after a waiting period, can I check this into the active
older branches?

2025-03-26  Michael Meissner  <[email protected]>

gcc/

        PR target/118541
        * config/rs6000/predicates.md (invert_fpmask_comparison_operator): Do
        not allow UNLT and UNLE unless -ffast-math.
        * config/rs6000/rs6000-protos.h (enum rev_cond_ordered): New 
enumeration.
        (rs6000_reverse_condition): Add argument.
        * config/rs6000/rs6000.cc (rs6000_reverse_condition): Do not allow
        ordered comparisons to be reversed for floating point conditional moves,
        but allow ordered comparisons to be reversed on jumps.
        (rs6000_emit_sCOND): Adjust rs6000_reverse_condition call.
        * config/rs6000/rs6000.h (REVERSE_CONDITION): Likewise.
        * config/rs6000/rs6000.md (reverse_branch_comparison): Name insn.
        Adjust rs6000_reverse_condition calls.

gcc/testsuite/

        PR target/118541
        * gcc.target/powerpc/pr118541-1.c: New test.
        * gcc.target/powerpc/pr118541-2.c: Likewise.
        * gcc.target/powerpc/pr118541-3.c: Likewise.
        * gcc.target/powerpc/pr118541-4.c: Likewise.
---
 gcc/config/rs6000/predicates.md               | 10 +++-
 gcc/config/rs6000/rs6000-protos.h             | 17 ++++++-
 gcc/config/rs6000/rs6000.cc                   | 46 ++++++++++++++-----
 gcc/config/rs6000/rs6000.h                    | 10 +++-
 gcc/config/rs6000/rs6000.md                   | 25 ++++++----
 gcc/testsuite/gcc.target/powerpc/pr118541-1.c | 28 +++++++++++
 gcc/testsuite/gcc.target/powerpc/pr118541-2.c | 26 +++++++++++
 gcc/testsuite/gcc.target/powerpc/pr118541-3.c | 26 +++++++++++
 gcc/testsuite/gcc.target/powerpc/pr118541-4.c | 26 +++++++++++
 9 files changed, 190 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr118541-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr118541-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr118541-3.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr118541-4.c

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 647e89afb6a..ba8df6a7979 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1466,8 +1466,16 @@ (define_predicate "fpmask_comparison_operator"
 ;; Return 1 if OP is a comparison operator suitable for vector/scalar
 ;; comparisons that generate a 0/-1 mask (i.e. the inverse of
 ;; fpmask_comparison_operator).
+;;
+;; invert_fpmask_comparison_operator is used to form floating point conditional
+;; moves on power9.  The instructions that would be generated (xscmpeqdp,
+;; xscmpgtdp, or xscmpgedp) will raise an error if one of the arguments is a
+;; signalling NaN.  Don't allow the test to be inverted if NaNs are supported
+;; and the comparison is an ordered comparison.
 (define_predicate "invert_fpmask_comparison_operator"
-  (match_code "ne,unlt,unle"))
+  (ior (match_code "ne")
+       (and (match_code "unlt,unle")
+           (match_test "flag_finite_math_only"))))
 
 ;; Return 1 if OP is a comparison operation suitable for integer vector/scalar
 ;; comparisons that generate a -1/0 mask.
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 234eb0ae2b3..08c1be99d91 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -114,8 +114,23 @@ extern const char *rs6000_sibcall_template (rtx *, 
unsigned int);
 extern const char *rs6000_indirect_call_template (rtx *, unsigned int);
 extern const char *rs6000_indirect_sibcall_template (rtx *, unsigned int);
 extern const char *rs6000_pltseq_template (rtx *, int);
+
+/* Whether we can reverse the sense of an ordered (UNLT, UNLE, UNGT, UNGE,
+   UNEQ, or LTGT) comairson.  If we are doing floating point conditional moves
+   on power9 and above, we cannot convert an ordered comparison to unordered,
+   since the instructions (XSCMP{EQ,GT,GE}DP) that are used for conditional
+   moves can trap if an argument is a signalling NaN.  However for normal jumps
+   we can reverse a comparison since we only use unordered compare instructions
+   which do not trap on signalling NaNs.  */
+
+enum rev_cond_ordered {
+  ordered_ok,
+  no_ordered
+};
+
 extern enum rtx_code rs6000_reverse_condition (machine_mode,
-                                              enum rtx_code);
+                                              enum rtx_code,
+                                              enum rev_cond_ordered);
 extern rtx rs6000_emit_eqne (machine_mode, rtx, rtx, rtx);
 extern rtx rs6000_emit_fp_cror (rtx_code, machine_mode, rtx);
 extern void rs6000_emit_sCOND (machine_mode, rtx[]);
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 737c3d6f7c7..2c13e9edf00 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15267,17 +15267,38 @@ rs6000_print_patchable_function_entry (FILE *file,
 }
 
 enum rtx_code
-rs6000_reverse_condition (machine_mode mode, enum rtx_code code)
+rs6000_reverse_condition (machine_mode mode,
+                         enum rtx_code code,
+                         enum rev_cond_ordered ordered_cmp_ok)
 {
   /* Reversal of FP compares takes care -- an ordered compare
-     becomes an unordered compare and vice versa.  */
-  if (mode == CCFPmode
-      && (!flag_finite_math_only
-         || code == UNLT || code == UNLE || code == UNGT || code == UNGE
-         || code == UNEQ || code == LTGT))
-    return reverse_condition_maybe_unordered (code);
-  else
-    return reverse_condition (code);
+     becomes an unordered compare and vice versa.
+
+     However, this is not safe for ordered comparisons (i.e. for isgreater,
+     etc.)  starting with the power9 because ifcvt.cc will want to create a fp
+     cmove, and the x{s,v}cmp{eq,gt,ge}{dp,qp} instructions will trap if one of
+     the arguments is a signalling NaN.  */
+
+  if (mode == CCFPmode)
+    {
+      /* If NaNs are allowed, don't allow the reversal of floating point
+        comparisons when the comparison is used in the context of a floating
+        point conditional move when no_ordered is passed.  We do allow the
+        comparsion to be reversed for explicit jumps when ordered_ok is
+        passed.  */
+      if (!flag_finite_math_only)
+       return (ordered_cmp_ok == rev_cond_ordered::no_ordered
+               ? UNKNOWN
+               : reverse_condition_maybe_unordered (code));
+
+      /* Explicit ordered comparisions can be reversed if NaNs are not
+        allowed.  */
+      else if (code == UNLT || code == UNLE || code == UNGT || code == UNGE
+              || code == UNEQ || code == LTGT)
+       return reverse_condition_maybe_unordered (code);
+    }
+
+  return reverse_condition (code);
 }
 
 /* Check if C (as 64bit integer) can be rotated to a constant which constains
@@ -15887,11 +15908,14 @@ rs6000_emit_sCOND (machine_mode mode, rtx operands[])
       rtx not_result = gen_reg_rtx (CCEQmode);
       rtx not_op, rev_cond_rtx;
       machine_mode cc_mode;
+      enum rtx_code rev;
 
       cc_mode = GET_MODE (XEXP (condition_rtx, 0));
 
-      rev_cond_rtx = gen_rtx_fmt_ee (rs6000_reverse_condition (cc_mode, 
cond_code),
-                                    SImode, XEXP (condition_rtx, 0), 
const0_rtx);
+      rev = rs6000_reverse_condition (cc_mode, cond_code,
+                                     rev_cond_ordered::ordered_ok);
+      rev_cond_rtx = gen_rtx_fmt_ee (rev, SImode, XEXP (condition_rtx, 0),
+                                    const0_rtx);
       not_op = gen_rtx_COMPARE (CCEQmode, rev_cond_rtx, const0_rtx);
       emit_insn (gen_rtx_SET (not_result, not_op));
       condition_rtx = gen_rtx_EQ (VOIDmode, not_result, const0_rtx);
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index db6112a09e1..712a3b43190 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -1797,11 +1797,17 @@ extern scalar_int_mode rs6000_pmode;
 
 /* Can the condition code MODE be safely reversed?  This is safe in
    all cases on this port, because at present it doesn't use the
-   trapping FP comparisons (fcmpo).  */
+   trapping FP comparisons (fcmpo).
+
+   However, this is not safe for ordered comparisons (i.e. for isgreater, etc.)
+   starting with the power9 because ifcvt.cc will want to create a fp cmove,
+   and the x{s,v}cmp{eq,gt,ge}{dp,qp} instructions will trap if one of the
+   arguments is a signalling NaN.  */
 #define REVERSIBLE_CC_MODE(MODE) 1
 
 /* Given a condition code and a mode, return the inverse condition.  */
-#define REVERSE_CONDITION(CODE, MODE) rs6000_reverse_condition (MODE, CODE)
+#define REVERSE_CONDITION(CODE, MODE)                                  \
+  rs6000_reverse_condition (MODE, CODE, rev_cond_ordered::no_ordered)
 
 
 /* Target cpu costs.  */
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 9c718ca2a22..e8f1a5ec375 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -13451,7 +13451,7 @@ (define_insn "@cceq_rev_compare_<mode>"
 ;; If we are comparing the result of two comparisons, this can be done
 ;; using creqv or crxor.
 
-(define_insn_and_split ""
+(define_insn_and_split "*reverse_branch_comparison"
   [(set (match_operand:CCEQ 0 "cc_reg_operand" "=y")
        (compare:CCEQ (match_operator 1 "branch_comparison_operator"
                              [(match_operand 2 "cc_reg_operand" "y")
@@ -13466,6 +13466,7 @@ (define_insn_and_split ""
                                    (match_dup 5)))]
 {
   int positive_1, positive_2;
+  enum rev_cond_ordered order_ok = rev_cond_ordered::ordered_ok;
 
   positive_1 = branch_positive_comparison_operator (operands[1],
                                                    GET_MODE (operands[1]));
@@ -13473,19 +13474,25 @@ (define_insn_and_split ""
                                                    GET_MODE (operands[3]));
 
   if (! positive_1)
-    operands[1] = gen_rtx_fmt_ee (rs6000_reverse_condition (GET_MODE 
(operands[2]),
-                                                           GET_CODE 
(operands[1])),
-                                 SImode,
-                                 operands[2], const0_rtx);
+    {
+      enum rtx_code rev = rs6000_reverse_condition (GET_MODE (operands[2]),
+                                                   GET_CODE (operands[1]),
+                                                   order_ok);
+      gcc_assert (rev != UNKNOWN);
+      operands[1] = gen_rtx_fmt_ee (rev, SImode, operands[2], const0_rtx);
+    }
   else if (GET_MODE (operands[1]) != SImode)
     operands[1] = gen_rtx_fmt_ee (GET_CODE (operands[1]), SImode,
                                  operands[2], const0_rtx);
 
   if (! positive_2)
-    operands[3] = gen_rtx_fmt_ee (rs6000_reverse_condition (GET_MODE 
(operands[4]),
-                                                           GET_CODE 
(operands[3])),
-                                 SImode,
-                                 operands[4], const0_rtx);
+    {
+      enum rtx_code rev = rs6000_reverse_condition (GET_MODE (operands[4]),
+                                                   GET_CODE (operands[3]),
+                                                   order_ok);
+      gcc_assert (rev != UNKNOWN);
+      operands[3] = gen_rtx_fmt_ee (rev, SImode, operands[4], const0_rtx);
+    }
   else if (GET_MODE (operands[3]) != SImode)
     operands[3] = gen_rtx_fmt_ee (GET_CODE (operands[3]), SImode,
                                  operands[4], const0_rtx);
diff --git a/gcc/testsuite/gcc.target/powerpc/pr118541-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr118541-1.c
new file mode 100644
index 00000000000..d5690dd7e38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr118541-1.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mdejagnu-cpu=power9 -O2" } */
+/* { dg-require-effective-target powerpc_vsx } */
+
+/* PR target/118541 says that the ordered comparison functions like isgreater
+   should not optimize floating point conditional moves to use
+   x{s,v}cmp{eq,gt,ge}{dp,qp} and xxsel since that instruction can cause traps
+   if one of the arguments is a signaling NaN.  */
+
+/* Verify isgreater does not generate xscmpgtdp when NaNs are allowed.  */
+
+double
+ordered_compare (double a, double b, double c, double d)
+{
+  /*
+   * fcmpu 0,1,2
+   * fmr   1,4
+   * bnglr 0
+   * fmr   1,3
+   * blr
+   */
+
+  return __builtin_isgreater (a, b) ? c : d;
+}
+
+/* { dg-final { scan-assembler-not {\mxscmpg[te]dp\M}       } } */
+/* { dg-final { scan-assembler-not {\mxxsel\M}              } } */
+/* { dg-final { scan-assembler     {\mxscmpudp\M|\mfcmpu\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr118541-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr118541-2.c
new file mode 100644
index 00000000000..5e1d83daeda
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr118541-2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-mdejagnu-cpu=power9 -Ofast" } */
+/* { dg-require-effective-target powerpc_vsx } */
+
+/* PR target/118541 says that the ordered comparison functions like isgreater
+   should not optimize floating point conditional moves to use
+   x{s,v}cmp{eq,gt,ge}{dp,qp} and xxsel since that instruction can cause traps
+   if one of the arguments is a signaling NaN.  */
+
+/* Verify isgreater does generate xscmpgtdp when NaNs are not allowed.  */
+
+double
+ordered_compare (double a, double b, double c, double d)
+{
+  /*
+   * xscmpgtdp 1,1,2
+   * xxsel     1,4,3,1
+   * blr
+   */
+
+  return __builtin_isgreater (a, b) ? c : d;
+}
+
+/* { dg-final { scan-assembler     {\mxscmpg[te]dp\M}       } } */
+/* { dg-final { scan-assembler     {\mxxsel\M}              } } */
+/* { dg-final { scan-assembler-not {\mxscmpudp\M|\mfcmpu\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr118541-3.c 
b/gcc/testsuite/gcc.target/powerpc/pr118541-3.c
new file mode 100644
index 00000000000..bac7486d211
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr118541-3.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-mdejagnu-cpu=power9 -O2" } */
+/* { dg-require-effective-target powerpc_vsx } */
+
+/* PR target/118541 says that the ordered comparison functions like isgreater
+   should not optimize floating point conditional moves to use
+   x{s,v}cmp{eq,gt,ge}{dp,qp} and xxsel since that instruction can cause traps
+   if one of the arguments is a signaling NaN.  */
+
+/* Verify normal > does generate xscmpgtdp when NaNs are allowed.  */
+
+double
+normal_compare (double a, double b, double c, double d)
+{
+  /*
+   * xscmpgtdp 1,1,2
+   * xxsel     1,4,3,1
+   * blr
+   */
+
+  return a > b ? c : d;
+}
+
+/* { dg-final { scan-assembler     {\mxscmpg[te]dp\M}       } } */
+/* { dg-final { scan-assembler     {\mxxsel\M}              } } */
+/* { dg-final { scan-assembler-not {\mxscmpudp\M|\mfcmpu\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr118541-4.c 
b/gcc/testsuite/gcc.target/powerpc/pr118541-4.c
new file mode 100644
index 00000000000..3e82a937c7b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr118541-4.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-mdejagnu-cpu=power9 -Ofast" } */
+/* { dg-require-effective-target powerpc_vsx } */
+
+/* PR target/118541 says that the ordered comparison functions like isgreater
+   should not optimize floating point conditional moves to use
+   x{s,v}cmp{eq,gt,ge}{dp,qp} and xxsel since that instruction can cause traps
+   if one of the arguments is a signaling NaN.  */
+
+/* Verify normal > does generate xscmpgtdp when NaNs are not allowed.  */
+
+double
+normal_compare (double a, double b, double c, double d)
+{
+  /*
+   * xscmpgtdp 1,1,2
+   * xxsel     1,4,3,1
+   * blr
+   */
+
+  return a > b ? c : d;
+}
+
+/* { dg-final { scan-assembler     {\mxscmpg[te]dp\M}       } } */
+/* { dg-final { scan-assembler     {\mxxsel\M}              } } */
+/* { dg-final { scan-assembler-not {\mxscmpudp\M|\mfcmpu\M} } } */
-- 
2.48.1


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: [email protected]

[PATCH, V4] PR target/118541 - Do not generate unordered fp cmoves for IEEE compares on PowerPC

Reply via email to