Re: Fix PR 118541 (V3), do not generate unordered fp cmoves for IEEE compares

Surya Kumari Jangala Thu, 22 May 2025 01:48:28 -0700

Hi Mike,
The source code changes are missing.

Regards,
Surya


On 22/05/25 10:46 am, Michael Meissner wrote:
> Fix PR 118541, do not generate unordered fp cmoves for IEEE compares.
> 
> This is version 3 of patch.  I re-implemented the patch to just focus on the
> generation of the XSCMP{EQ,GT,GE}{DP,QP} instructions.
> 
> In bug PR target/118541 on power9, power10, and power11 systems, for the
> function:
> 
>         extern double __ieee754_acos (double);
> 
>         double
>         __acospi (double x)
>         {
>           double ret = __ieee754_acos (x) / 3.14;
>           return __builtin_isgreater (ret, 1.0) ? 1.0 : ret;
>         }
> 
> GCC currently generates the following code:
> 
>         Power9                          Power10 and Power11
>         ======                          ===================
>         bl __ieee754_acos               bl __ieee754_acos@notoc
>         nop                             plfd 0,.LC0@pcrel
>         addis 9,2,.LC2@toc@ha           xxspltidp 12,1065353216
>         addi 1,1,32                     addi 1,1,32
>         lfd 0,.LC2@toc@l(9)             ld 0,16(1)
>         addis 9,2,.LC0@toc@ha           fdiv 0,1,0
>         ld 0,16(1)                      mtlr 0
>         lfd 12,.LC0@toc@l(9)            xscmpgtdp 1,0,12
>         fdiv 0,1,0                      xxsel 1,0,12,1
>         mtlr 0                          blr
>         xscmpgtdp 1,0,12
>         xxsel 1,0,12,1
>         blr
> 
> This is because ifcvt.c optimizes the conditional floating point move to use 
> the
> XSCMPGTDP instruction.
> 
> However, the XSCMPGTDP instruction will generate an interrupt if one of the
> arguments is a signalling NaN and signalling NaNs can generate an interrupt.
> The IEEE comparison functions (isgreater, etc.) require that the comparison 
> not
> raise an interrupt.
> 
> The root cause of this is we allow floating point comparisons to be reversed
> (i.e. LT will be reversed to UNGE).  Before power9, this was ok because we 
> only
> generated the FCMPU or XSCMPUDP instructions.
> 
> But with power9, we can generate the XSCMPEQDP, XSCMPGTDP, or XSCMPGEDP
> instructions.  This code now does not convert an unordered compare into an
> ordered compare.  Instead, it does the opposite comparison and swaps the
> arguments.  I.e. it converts:
> 
>       r = (a < b) ? c : d;
> 
> into:
> 
>       r = (b >= a) ? c : d;
> 
> For the following code:
> 
>         double
>         ordered_compare (double a, double b, double c, double d)
>         {
>           return __builtin_isgreater (a, b) ? c : d;
>         }
> 
>         /* Verify normal > does generate xscmpgtdp.  */
> 
>         double
>         normal_compare (double a, double b, double c, double d)
>         {
>           return a > b ? c : d;
>         }
> 
> with the following patch, GCC generates the following for power9, power10, and
> power11:
> 
>         ordered_compare:
>                 fcmpu 0,1,2
>                 fmr 1,4
>                 bnglr 0
>                 fmr 1,3
>                 blr
> 
>         normal_compare:
>                 xscmpgtdp 1,1,2
>                 xxsel 1,4,3,1
>                 blr
> 
> I have built bootstrap compilers on big endian power9 systems and little 
> endian
> power9/power10 systems and there were no regressions.  Can I check this patch
> into the GCC trunk, and after a waiting period, can I check this into the 
> active
> older branches?
> 
> 2025-05-21  Michael Meissner  <[email protected]>
> 
> gcc/
> 
>       PR target/118541
>       * config/rs6000/predicates.md (invert_fpmask_comparison_operator):
>       Delete.
>       (fpmask_reverse_args_comparison_operator): New predicate.
>       * config/rs6000/rs6000-proto.h (rs6000_fpmask_reverse_args): New
>       declaration.
>       * config/rs6000/rs6000.cc (rs6000_fpmask_reverse_args): New function.
>       * config/rs6000/rs6000.h (REVERSIBLE_CC_MODE): Do not allow floating
>       point comparisons to be reversed unless -ffinite-math-only is used.
>       * config/rs6000/rs6000.md (mov<SFDF:mode><SFDF2:mode>cc_p9): Add
>       comment.
>       (mov<SFDF:mode><SFDF2:mode>cc_invert_p9): Reverse the argument order for
>       the comparison, and use an unordered comparison, instead of ordered
>       comparison.
>       (mov<mode>cc_invert_p10): Likewise.
> 
> gcc/testsuite/
> 
>       PR target/118541
>       * gcc.target/powerpc/pr118541.c: New test.
>

Re: Fix PR 118541 (V3), do not generate unordered fp cmoves for IEEE compares

Reply via email to