Re: [PATCH] Fix PR 118541, do not generate unordered fp cmoves for IEEE compares.

Michael Meissner Fri, 31 Jan 2025 14:35:50 -0800

On Fri, Jan 31, 2025 at 08:04:53AM +0100, Richard Biener wrote:
> On Fri, Jan 31, 2025 at 3:55 AM Michael Meissner <meiss...@linux.ibm.com> 
> wrote:
> >
> > Fix PR 118541, do not generate unordered fp cmoves for IEEE compares.
> >
> > In bug PR target/118541 on power9, power10, and power11 systems, for the
> > function:
> >
> >         extern double __ieee754_acos (double);
> >
> >         double
> >         __acospi (double x)
> >         {
> >           double ret = __ieee754_acos (x) / 3.14;
> >           return __builtin_isgreater (ret, 1.0) ? 1.0 : ret;
> >         }
> >
> > GCC currently generates the following code:
> >
> >         Power9                          Power10 and Power11
> >         ======                          ===================
> >         bl __ieee754_acos               bl __ieee754_acos@notoc
> >         nop                             plfd 0,.LC0@pcrel
> >         addis 9,2,.LC2@toc@ha           xxspltidp 12,1065353216
> >         addi 1,1,32                     addi 1,1,32
> >         lfd 0,.LC2@toc@l(9)             ld 0,16(1)
> >         addis 9,2,.LC0@toc@ha           fdiv 0,1,0
> >         ld 0,16(1)                      mtlr 0
> >         lfd 12,.LC0@toc@l(9)            xscmpgtdp 1,0,12
> >         fdiv 0,1,0                      xxsel 1,0,12,1
> >         mtlr 0                          blr
> >         xscmpgtdp 1,0,12
> >         xxsel 1,0,12,1
> >         blr
> >
> > This is because ifcvt.c optimizes the conditional floating point move to 
> > use the
> > XSCMPGTDP instruction.
> >
> > However, the XSCMPGTDP instruction will generate an interrupt if one of the
> > arguments is a signalling NaN and signalling NaNs can generate an interrupt.
> > The IEEE comparison functions (isgreater, etc.) require that the comparison 
> > not
> > raise an interrupt.
> >
> > The following patch changes the PowerPC back end so that ifcvt.c will not 
> > change
> > the if/then test and move into a conditional move if the comparison is one 
> > of
> > the comparisons that do not raise an error with signalling NaNs and -Ofast 
> > is
> > not used.  If a normal comparison is used or -Ofast is used, GCC will 
> > continue
> > to generate XSCMPGTDP and XXSEL.
> >
> > For the following code:
> >
> >         double
> >         ordered_compare (double a, double b, double c, double d)
> >         {
> >           return __builtin_isgreater (a, b) ? c : d;
> >         }
> >
> >         /* Verify normal > does generate xscmpgtdp.  */
> >
> >         double
> >         normal_compare (double a, double b, double c, double d)
> >         {
> >           return a > b ? c : d;
> >         }
> >
> > with the following patch, GCC generates the following for power9, power10, 
> > and
> > power11:
> >
> >         ordered_compare:
> >                 fcmpu 0,1,2
> >                 fmr 1,4
> >                 bnglr 0
> >                 fmr 1,3
> >                 blr
> >
> >         normal_compare:
> >                 xscmpgtdp 1,1,2
> >                 xxsel 1,4,3,1
> >                 blr
> >
> > I have built bootstrap compilers on big endian power9 systems and little 
> > endian
> > power9/power10 systems and there were no regressions.  Can I check this 
> > patch
> > into the GCC trunk, and after a waiting period, can I check this into the 
> > active
> > older branches?
> >
> > 2025-01-30  Michael Meissner  <meiss...@linux.ibm.com>
> >
> > gcc/
> >
> >         PR target/118541
> >         * config/rs6000/rs6000-protos.h (REVERSE_COND_ORDERED_OK): New 
> > macro.
> >         (REVERSE_COND_NO_ORDERED): Likewise.
> >         (rs6000_reverse_condition): Add argument.
> >         * config/rs6000/rs6000.cc (rs6000_reverse_condition): Do not allow
> >         ordered comparisons to be reversed for floating point cmoves.
> >         (rs6000_emit_sCOND): Adjust rs6000_reverse_condition call.
> >         * config/rs6000/rs6000.h (REVERSE_CONDITION): Likewise.
> >         * config/rs6000/rs6000.md (reverse_branch_comparison): Name insn.
> >         Adjust rs6000_reverse_condition call.
> >
> > gcc/testsuite/
> >
> >         PR target/118541
> >         * gcc.target/powerpc/pr118541.c: New test.
> > ---
> >  gcc/config/rs6000/rs6000-protos.h           |  6 ++-
> >  gcc/config/rs6000/rs6000.cc                 | 23 ++++++++---
> >  gcc/config/rs6000/rs6000.h                  | 10 ++++-
> >  gcc/config/rs6000/rs6000.md                 | 24 +++++++-----
> >  gcc/testsuite/gcc.target/powerpc/pr118541.c | 43 +++++++++++++++++++++
> >  5 files changed, 89 insertions(+), 17 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr118541.c
> >
> > diff --git a/gcc/config/rs6000/rs6000-protos.h 
> > b/gcc/config/rs6000/rs6000-protos.h
> > index 4619142d197..112332660d3 100644
> > --- a/gcc/config/rs6000/rs6000-protos.h
> > +++ b/gcc/config/rs6000/rs6000-protos.h
> > @@ -114,8 +114,12 @@ extern const char *rs6000_sibcall_template (rtx *, 
> > unsigned int);
> >  extern const char *rs6000_indirect_call_template (rtx *, unsigned int);
> >  extern const char *rs6000_indirect_sibcall_template (rtx *, unsigned int);
> >  extern const char *rs6000_pltseq_template (rtx *, int);
> > +
> > +#define REVERSE_COND_ORDERED_OK        false
> > +#define REVERSE_COND_NO_ORDERED        true
> > +
> >  extern enum rtx_code rs6000_reverse_condition (machine_mode,
> > -                                              enum rtx_code);
> > +                                              enum rtx_code, bool);
> >  extern rtx rs6000_emit_eqne (machine_mode, rtx, rtx, rtx);
> >  extern rtx rs6000_emit_fp_cror (rtx_code, machine_mode, rtx);
> >  extern void rs6000_emit_sCOND (machine_mode, rtx[]);
> > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> > index f9f9a0b931d..eaf79435ec3 100644
> > --- a/gcc/config/rs6000/rs6000.cc
> > +++ b/gcc/config/rs6000/rs6000.cc
> > @@ -15360,15 +15360,25 @@ rs6000_print_patchable_function_entry (FILE *file,
> >  }
> >
> >  enum rtx_code
> > -rs6000_reverse_condition (machine_mode mode, enum rtx_code code)
> > +rs6000_reverse_condition (machine_mode mode,
> > +                         enum rtx_code code,
> > +                         bool no_ordered)
> >  {
> >    /* Reversal of FP compares takes care -- an ordered compare
> > -     becomes an unordered compare and vice versa.  */
> > +     becomes an unordered compare and vice versa.
> > +
> > +     However, this is not safe for ordered comparisons (i.e. for isgreater,
> > +     etc.)  starting with the power9 because ifcvt.cc will want to create 
> > a fp
> > +     cmove, and the x{s,v}cmp{eq,gt,ge}{dp,qp} instructions will trap if 
> > one of
> > +     the arguments is a signalling NaN.  */
> > +
> >    if (mode == CCFPmode
> >        && (!flag_finite_math_only
> 
> I fail to see where -Ofast is allowed, but only see the pre-existing
> flag check above
> which checks for no NaN/Inf rather than sNaN - the latter would be checked 
> with
> HONOR_SNANS (mode).


The !flag_finite_math_only will allow the optimization to be compiled, but
using HONOR_NANS is probably a better way.  I'll rewrite the test.  Thanks.

For:

        double
        ordered_compare (double a, double b, double c, double d)
        {
          return __builtin_isgreater (a, b) ? c : d;
        }

Here is what -Ofast generates:

        xscmpgedp 1,2,1
        xxsel 1,3,4,1

Here is what -O2 generates:

       fcmpu 0,1,2
        fmr 1,4
        bnglr 0
        fmr 1,3
        blr

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Re: [PATCH] Fix PR 118541, do not generate unordered fp cmoves for IEEE compares.

Reply via email to