On Fri, Jan 31, 2025 at 08:04:53AM +0100, Richard Biener wrote: > On Fri, Jan 31, 2025 at 3:55 AM Michael Meissner <meiss...@linux.ibm.com> > wrote: > > > > Fix PR 118541, do not generate unordered fp cmoves for IEEE compares. > > > > In bug PR target/118541 on power9, power10, and power11 systems, for the > > function: > > > > extern double __ieee754_acos (double); > > > > double > > __acospi (double x) > > { > > double ret = __ieee754_acos (x) / 3.14; > > return __builtin_isgreater (ret, 1.0) ? 1.0 : ret; > > } > > > > GCC currently generates the following code: > > > > Power9 Power10 and Power11 > > ====== =================== > > bl __ieee754_acos bl __ieee754_acos@notoc > > nop plfd 0,.LC0@pcrel > > addis 9,2,.LC2@toc@ha xxspltidp 12,1065353216 > > addi 1,1,32 addi 1,1,32 > > lfd 0,.LC2@toc@l(9) ld 0,16(1) > > addis 9,2,.LC0@toc@ha fdiv 0,1,0 > > ld 0,16(1) mtlr 0 > > lfd 12,.LC0@toc@l(9) xscmpgtdp 1,0,12 > > fdiv 0,1,0 xxsel 1,0,12,1 > > mtlr 0 blr > > xscmpgtdp 1,0,12 > > xxsel 1,0,12,1 > > blr > > > > This is because ifcvt.c optimizes the conditional floating point move to > > use the > > XSCMPGTDP instruction. > > > > However, the XSCMPGTDP instruction will generate an interrupt if one of the > > arguments is a signalling NaN and signalling NaNs can generate an interrupt. > > The IEEE comparison functions (isgreater, etc.) require that the comparison > > not > > raise an interrupt. > > > > The following patch changes the PowerPC back end so that ifcvt.c will not > > change > > the if/then test and move into a conditional move if the comparison is one > > of > > the comparisons that do not raise an error with signalling NaNs and -Ofast > > is > > not used. If a normal comparison is used or -Ofast is used, GCC will > > continue > > to generate XSCMPGTDP and XXSEL. > > > > For the following code: > > > > double > > ordered_compare (double a, double b, double c, double d) > > { > > return __builtin_isgreater (a, b) ? c : d; > > } > > > > /* Verify normal > does generate xscmpgtdp. */ > > > > double > > normal_compare (double a, double b, double c, double d) > > { > > return a > b ? c : d; > > } > > > > with the following patch, GCC generates the following for power9, power10, > > and > > power11: > > > > ordered_compare: > > fcmpu 0,1,2 > > fmr 1,4 > > bnglr 0 > > fmr 1,3 > > blr > > > > normal_compare: > > xscmpgtdp 1,1,2 > > xxsel 1,4,3,1 > > blr > > > > I have built bootstrap compilers on big endian power9 systems and little > > endian > > power9/power10 systems and there were no regressions. Can I check this > > patch > > into the GCC trunk, and after a waiting period, can I check this into the > > active > > older branches? > > > > 2025-01-30 Michael Meissner <meiss...@linux.ibm.com> > > > > gcc/ > > > > PR target/118541 > > * config/rs6000/rs6000-protos.h (REVERSE_COND_ORDERED_OK): New > > macro. > > (REVERSE_COND_NO_ORDERED): Likewise. > > (rs6000_reverse_condition): Add argument. > > * config/rs6000/rs6000.cc (rs6000_reverse_condition): Do not allow > > ordered comparisons to be reversed for floating point cmoves. > > (rs6000_emit_sCOND): Adjust rs6000_reverse_condition call. > > * config/rs6000/rs6000.h (REVERSE_CONDITION): Likewise. > > * config/rs6000/rs6000.md (reverse_branch_comparison): Name insn. > > Adjust rs6000_reverse_condition call. > > > > gcc/testsuite/ > > > > PR target/118541 > > * gcc.target/powerpc/pr118541.c: New test. > > --- > > gcc/config/rs6000/rs6000-protos.h | 6 ++- > > gcc/config/rs6000/rs6000.cc | 23 ++++++++--- > > gcc/config/rs6000/rs6000.h | 10 ++++- > > gcc/config/rs6000/rs6000.md | 24 +++++++----- > > gcc/testsuite/gcc.target/powerpc/pr118541.c | 43 +++++++++++++++++++++ > > 5 files changed, 89 insertions(+), 17 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr118541.c > > > > diff --git a/gcc/config/rs6000/rs6000-protos.h > > b/gcc/config/rs6000/rs6000-protos.h > > index 4619142d197..112332660d3 100644 > > --- a/gcc/config/rs6000/rs6000-protos.h > > +++ b/gcc/config/rs6000/rs6000-protos.h > > @@ -114,8 +114,12 @@ extern const char *rs6000_sibcall_template (rtx *, > > unsigned int); > > extern const char *rs6000_indirect_call_template (rtx *, unsigned int); > > extern const char *rs6000_indirect_sibcall_template (rtx *, unsigned int); > > extern const char *rs6000_pltseq_template (rtx *, int); > > + > > +#define REVERSE_COND_ORDERED_OK false > > +#define REVERSE_COND_NO_ORDERED true > > + > > extern enum rtx_code rs6000_reverse_condition (machine_mode, > > - enum rtx_code); > > + enum rtx_code, bool); > > extern rtx rs6000_emit_eqne (machine_mode, rtx, rtx, rtx); > > extern rtx rs6000_emit_fp_cror (rtx_code, machine_mode, rtx); > > extern void rs6000_emit_sCOND (machine_mode, rtx[]); > > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > > index f9f9a0b931d..eaf79435ec3 100644 > > --- a/gcc/config/rs6000/rs6000.cc > > +++ b/gcc/config/rs6000/rs6000.cc > > @@ -15360,15 +15360,25 @@ rs6000_print_patchable_function_entry (FILE *file, > > } > > > > enum rtx_code > > -rs6000_reverse_condition (machine_mode mode, enum rtx_code code) > > +rs6000_reverse_condition (machine_mode mode, > > + enum rtx_code code, > > + bool no_ordered) > > { > > /* Reversal of FP compares takes care -- an ordered compare > > - becomes an unordered compare and vice versa. */ > > + becomes an unordered compare and vice versa. > > + > > + However, this is not safe for ordered comparisons (i.e. for isgreater, > > + etc.) starting with the power9 because ifcvt.cc will want to create > > a fp > > + cmove, and the x{s,v}cmp{eq,gt,ge}{dp,qp} instructions will trap if > > one of > > + the arguments is a signalling NaN. */ > > + > > if (mode == CCFPmode > > && (!flag_finite_math_only > > I fail to see where -Ofast is allowed, but only see the pre-existing > flag check above > which checks for no NaN/Inf rather than sNaN - the latter would be checked > with > HONOR_SNANS (mode).
The !flag_finite_math_only will allow the optimization to be compiled, but using HONOR_NANS is probably a better way. I'll rewrite the test. Thanks. For: double ordered_compare (double a, double b, double c, double d) { return __builtin_isgreater (a, b) ? c : d; } Here is what -Ofast generates: xscmpgedp 1,2,1 xxsel 1,3,4,1 Here is what -O2 generates: fcmpu 0,1,2 fmr 1,4 bnglr 0 fmr 1,3 blr -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com