On 09.09.22 12:16, Richard Biener wrote:
On Fri, 9 Sep 2022, Tobias Burnus wrote:
-funsafe-math-optimizations implies -fno-signed-zeros, -fno-trapping-math,
-fassociative-math,
and -freciprocal-math. All of them reduce precision and my violate IEEE or
ISO/language standards.

However, I think it is rather surprising to have all of the sudden only a
precision of the order of 100,000,000 ULP instead of ~4 ULP as to be expected.
That's a precision loss of the order of 10^8 or 2^29 which is huge!
...
I agree - for example powerpc has -mrecip= to control which instructions
to use (float/double rsqrt or inverse) and -mrecip-precision to
specify whether further iteration is done or not.
[...]
Your suggested huge reduction in precision isn't usually acceptable
and should be always explicitely enabled.

First, I have to correct myself – Kwok's -munsafe-math-optimizations is
only about single-precision functions, which do not have this problem.

However, the pre-existing 'sqrt' problem still is real. It also applies
to reverse sqrt ("v_rsq"), but that's for whatever reason not used for GCN.

This patch now adds a commandline flag - off by default - to choose
whether this behavior is wanted. I did use the same name as aarch64,
https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#index-mlow-precision-sqrt
(the latter also has -mlow-precision-recip-sqrt, which is not (yet)
sensible for GCN.)

This patch was manually tested for all combinations and I also looked at
insn-recog.cc, given that it is my first .md patch – it it seems to work
fine.

OK for mainline – or are there comments or more suggestions? I also
included some word for the release notes.

Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
GCN: Add -mlow-precision-sqrt for double-precision sqrt [PR105246]

GCN's sqrt supports single and double precision; however, for either
the result has 23 bits for the fractional part of the floating-point
number. (For double precision: instead of 52 bits).

This adds now -mlow-precision-sqrt, using the same naming as aarch64.
Before, the hardware builtin sqrt was always used with
unsafe-math-optimiaztions, now only with single precision; for
double precision, the new -mlow-precision-sqrt is explicitly
required in addition. As there is no rsqrt, this flag likewise
applies to 1/sqrt.

	PR target/105246

gcc/ChangeLog:

	* config/gcn/gcn.opt (mlow-precision-sqrt): New, off by default.
	* config/gcn/gcn-valu.md (sqrt, v_sqrt): Require it unless SFmode.
	* doc/invoke.texi (GCN): Add -mlow-precision-sqrt entry.

 gcc/config/gcn/gcn-valu.md |  6 ++++--
 gcc/config/gcn/gcn.opt     |  7 +++++++
 gcc/doc/invoke.texi        | 11 +++++++++++
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 8c33ae0c717..c7a0b562874 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -2276,7 +2276,8 @@ (define_insn "sqrt<mode>2<exec>"
   [(set (match_operand:V_FP 0 "register_operand"  "=  v")
 	(sqrt:V_FP
 	  (match_operand:V_FP 1 "gcn_alu_operand" "vSvB")))]
-  "flag_unsafe_math_optimizations"
+  "(flag_unsafe_math_optimizations
+    && (<MODE>mode == V64SFmode || flag_mlow_precision_sqrt))"
   "v_sqrt%i0\t%0, %1"
   [(set_attr "type" "vop1")
    (set_attr "length" "8")])
@@ -2285,7 +2286,8 @@ (define_insn "sqrt<mode>2"
   [(set (match_operand:FP 0 "register_operand"  "=  v")
 	(sqrt:FP
 	  (match_operand:FP 1 "gcn_alu_operand" "vSvB")))]
-  "flag_unsafe_math_optimizations"
+  "(flag_unsafe_math_optimizations
+    && (<MODE>mode == SFmode || flag_mlow_precision_sqrt))"
   "v_sqrt%i0\t%0, %1"
   [(set_attr "type" "vop1")
    (set_attr "length" "8")])
diff --git a/gcc/config/gcn/gcn.opt b/gcc/config/gcn/gcn.opt
index 9606aaf0b1a..a3f341f7eb1 100644
--- a/gcc/config/gcn/gcn.opt
+++ b/gcc/config/gcn/gcn.opt
@@ -77,6 +77,13 @@ mgang-private-size=
 Target RejectNegative Joined UInteger Var(gang_private_size_opt) Init(-1)
 Amount of local data-share (LDS) memory to reserve for gang-private variables.
 
+mlow-precision-sqrt
+Target Var(flag_mlow_precision_sqrt) Optimization
+Enable the square root approximation for 64bit double precision;
+this reduces precision of square root results to 23 bits for the
+fractional part of the floating-point number.
+It also implies low-precision reciprocal sqrt.
+
 Wopenacc-dims
 Target Var(warn_openacc_dims) Warning
 Warn about invalid OpenACC dimensions.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5c066219a7d..fdd6e41cade 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -20192,6 +20192,17 @@ compiled code must match the device mode.  The default is @samp{-mno-xnack}.
 At present this option is a placeholder for support that is not yet
 implemented.
 
+@item -mlow-precision-sqrt
+@itemx -mno-low-precision-sqrt
+@opindex mlow-precision-sqrt
+@opindex mno-low-precision-sqrt
+Enable the square root approximation for 64bit double precision;
+this reduces precision of square root results to 23 bits for the
+fractional part of the floating-point number (2@sup{29} ULP).
+It also implies low-precision reciprocal sqrt.
+This option only has an effect if @option{-ffast-math} or
+@option{-funsafe-math-optimizations} is used as well.
+
 @end table
 
 @node ARC Options
gcc-13/changes.html - GCN: document -mlow-precision-sqrt

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index 390193ca..d335eab3 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -179,6 +179,12 @@ a work-in-progress.</p>
   <li>Support for the Instinct MI200 series devices (<a
       href="https://gcc.gnu.org/onlinedocs/gcc/AMD-GCN-Options.html";>
       <code>gfx90a</code></a>) has been added.</li>
+  <li>The <code>-mlow-precision-sqrt</code> option (disabled by default)
+      has been added to use the hardware <code>sqrt</code> also for
+      double-precision floating point arguments; note that the result
+      only has much a reduced accurary of 2<sup>29</sup> ULP. This
+      option requires <code>-funsafe-math-optimizations</code>
+      (implied by <code>-ffast-math</code>) in addition.</li>
 </ul>
 
 <!-- <h3 id="arc">ARC</h3> -->

Reply via email to