The following patch addds support for PTX's rcp.rn.f32 and rcp.rn.f64 instructions. Note that the "rcp.rn" forms of this instruction calculate the fully IEEE compliant result for the reciprocal, unlike the rcp.approx variants that just provide fast approximations.
I'm undecided as to whether to prefix this instruction name with "*" as this pattern is almost standard, so I can imagine the middle-end potentially adding optab support for recip<mode>2 expansion at some point. I'll go with the (backend) reviewer's recommendation? This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu with "make" and "make check" with no new regressions. Ok for mainline? 2020-07-12 Roger Sayle <ro...@nextmovesoftware.com> gcc/ChangeLog * config/nvptx/nvptx.md (recip<mode>2): New instruction. gcc/testsuite/ChangeLog * gcc.target/nvptx/recip-1.c: New test. Thanks in advance, Roger -- Roger Sayle NextMove Software Cambridge, UK
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 6545b81..ea6fc94 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -872,6 +872,15 @@ "" "%.\\tfma%#%t0\\t%0, %1, %2, %3;") +(define_insn "recip<mode>2" + [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R") + (div:SDFM + (match_operand:SDFM 2 "const_double_operand" "F") + (match_operand:SDFM 1 "nvptx_register_operand" "R")))] + "CONST_DOUBLE_P (operands[2]) + && real_identical (CONST_DOUBLE_REAL_VALUE (operands[2]), &dconst1)" + "%.\\trcp%#%t0\\t%0, %1;") + (define_insn "div<mode>3" [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R") (div:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R") diff --git a/gcc/testsuite/gcc.target/nvptx/recip-1.c b/gcc/testsuite/gcc.target/nvptx/recip-1.c new file mode 100644 index 0000000..9a165f1 --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/recip-1.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +double foo(double x) +{ + return 1.0/x; +} + +float foof(float x) +{ + return 1.0f/x; +} + +/* { dg-final { scan-assembler-times "rcp.rn.f64" 1 } } */ +/* { dg-final { scan-assembler-times "rcp.rn.f32" 1 } } */ +