Hi Mike, The source code changes are missing. Regards, Surya
On 22/05/25 10:46 am, Michael Meissner wrote: > Fix PR 118541, do not generate unordered fp cmoves for IEEE compares. > > This is version 3 of patch. I re-implemented the patch to just focus on the > generation of the XSCMP{EQ,GT,GE}{DP,QP} instructions. > > In bug PR target/118541 on power9, power10, and power11 systems, for the > function: > > extern double __ieee754_acos (double); > > double > __acospi (double x) > { > double ret = __ieee754_acos (x) / 3.14; > return __builtin_isgreater (ret, 1.0) ? 1.0 : ret; > } > > GCC currently generates the following code: > > Power9 Power10 and Power11 > ====== =================== > bl __ieee754_acos bl __ieee754_acos@notoc > nop plfd 0,.LC0@pcrel > addis 9,2,.LC2@toc@ha xxspltidp 12,1065353216 > addi 1,1,32 addi 1,1,32 > lfd 0,.LC2@toc@l(9) ld 0,16(1) > addis 9,2,.LC0@toc@ha fdiv 0,1,0 > ld 0,16(1) mtlr 0 > lfd 12,.LC0@toc@l(9) xscmpgtdp 1,0,12 > fdiv 0,1,0 xxsel 1,0,12,1 > mtlr 0 blr > xscmpgtdp 1,0,12 > xxsel 1,0,12,1 > blr > > This is because ifcvt.c optimizes the conditional floating point move to use > the > XSCMPGTDP instruction. > > However, the XSCMPGTDP instruction will generate an interrupt if one of the > arguments is a signalling NaN and signalling NaNs can generate an interrupt. > The IEEE comparison functions (isgreater, etc.) require that the comparison > not > raise an interrupt. > > The root cause of this is we allow floating point comparisons to be reversed > (i.e. LT will be reversed to UNGE). Before power9, this was ok because we > only > generated the FCMPU or XSCMPUDP instructions. > > But with power9, we can generate the XSCMPEQDP, XSCMPGTDP, or XSCMPGEDP > instructions. This code now does not convert an unordered compare into an > ordered compare. Instead, it does the opposite comparison and swaps the > arguments. I.e. it converts: > > r = (a < b) ? c : d; > > into: > > r = (b >= a) ? c : d; > > For the following code: > > double > ordered_compare (double a, double b, double c, double d) > { > return __builtin_isgreater (a, b) ? c : d; > } > > /* Verify normal > does generate xscmpgtdp. */ > > double > normal_compare (double a, double b, double c, double d) > { > return a > b ? c : d; > } > > with the following patch, GCC generates the following for power9, power10, and > power11: > > ordered_compare: > fcmpu 0,1,2 > fmr 1,4 > bnglr 0 > fmr 1,3 > blr > > normal_compare: > xscmpgtdp 1,1,2 > xxsel 1,4,3,1 > blr > > I have built bootstrap compilers on big endian power9 systems and little > endian > power9/power10 systems and there were no regressions. Can I check this patch > into the GCC trunk, and after a waiting period, can I check this into the > active > older branches? > > 2025-05-21 Michael Meissner <meiss...@linux.ibm.com> > > gcc/ > > PR target/118541 > * config/rs6000/predicates.md (invert_fpmask_comparison_operator): > Delete. > (fpmask_reverse_args_comparison_operator): New predicate. > * config/rs6000/rs6000-proto.h (rs6000_fpmask_reverse_args): New > declaration. > * config/rs6000/rs6000.cc (rs6000_fpmask_reverse_args): New function. > * config/rs6000/rs6000.h (REVERSIBLE_CC_MODE): Do not allow floating > point comparisons to be reversed unless -ffinite-math-only is used. > * config/rs6000/rs6000.md (mov<SFDF:mode><SFDF2:mode>cc_p9): Add > comment. > (mov<SFDF:mode><SFDF2:mode>cc_invert_p9): Reverse the argument order for > the comparison, and use an unordered comparison, instead of ordered > comparison. > (mov<mode>cc_invert_p10): Likewise. > > gcc/testsuite/ > > PR target/118541 > * gcc.target/powerpc/pr118541.c: New test. >