On 09.09.22 12:16, Richard Biener wrote:
On Fri, 9 Sep 2022, Tobias Burnus wrote:
-funsafe-math-optimizations implies -fno-signed-zeros, -fno-trapping-math,
-fassociative-math,
and -freciprocal-math. All of them reduce precision and my violate IEEE or
ISO/language standards.
However, I think it is rather surprising to have all of the sudden only a
precision of the order of 100,000,000 ULP instead of ~4 ULP as to be expected.
That's a precision loss of the order of 10^8 or 2^29 which is huge!
...
I agree - for example powerpc has -mrecip= to control which instructions
to use (float/double rsqrt or inverse) and -mrecip-precision to
specify whether further iteration is done or not.
[...]
Your suggested huge reduction in precision isn't usually acceptable
and should be always explicitely enabled.
First, I have to correct myself – Kwok's -munsafe-math-optimizations is
only about single-precision functions, which do not have this problem.
However, the pre-existing 'sqrt' problem still is real. It also applies
to reverse sqrt ("v_rsq"), but that's for whatever reason not used for GCN.
This patch now adds a commandline flag - off by default - to choose
whether this behavior is wanted. I did use the same name as aarch64,
https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#index-mlow-precision-sqrt
(the latter also has -mlow-precision-recip-sqrt, which is not (yet)
sensible for GCN.)
This patch was manually tested for all combinations and I also looked at
insn-recog.cc, given that it is my first .md patch – it it seems to work
fine.
OK for mainline – or are there comments or more suggestions? I also
included some word for the release notes.
Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht
München, HRB 106955
GCN: Add -mlow-precision-sqrt for double-precision sqrt [PR105246]
GCN's sqrt supports single and double precision; however, for either
the result has 23 bits for the fractional part of the floating-point
number. (For double precision: instead of 52 bits).
This adds now -mlow-precision-sqrt, using the same naming as aarch64.
Before, the hardware builtin sqrt was always used with
unsafe-math-optimiaztions, now only with single precision; for
double precision, the new -mlow-precision-sqrt is explicitly
required in addition. As there is no rsqrt, this flag likewise
applies to 1/sqrt.
PR target/105246
gcc/ChangeLog:
* config/gcn/gcn.opt (mlow-precision-sqrt): New, off by default.
* config/gcn/gcn-valu.md (sqrt, v_sqrt): Require it unless SFmode.
* doc/invoke.texi (GCN): Add -mlow-precision-sqrt entry.
gcc/config/gcn/gcn-valu.md | 6 ++++--
gcc/config/gcn/gcn.opt | 7 +++++++
gcc/doc/invoke.texi | 11 +++++++++++
3 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 8c33ae0c717..c7a0b562874 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -2276,7 +2276,8 @@ (define_insn "sqrt<mode>2<exec>"
[(set (match_operand:V_FP 0 "register_operand" "= v")
(sqrt:V_FP
(match_operand:V_FP 1 "gcn_alu_operand" "vSvB")))]
- "flag_unsafe_math_optimizations"
+ "(flag_unsafe_math_optimizations
+ && (<MODE>mode == V64SFmode || flag_mlow_precision_sqrt))"
"v_sqrt%i0\t%0, %1"
[(set_attr "type" "vop1")
(set_attr "length" "8")])
@@ -2285,7 +2286,8 @@ (define_insn "sqrt<mode>2"
[(set (match_operand:FP 0 "register_operand" "= v")
(sqrt:FP
(match_operand:FP 1 "gcn_alu_operand" "vSvB")))]
- "flag_unsafe_math_optimizations"
+ "(flag_unsafe_math_optimizations
+ && (<MODE>mode == SFmode || flag_mlow_precision_sqrt))"
"v_sqrt%i0\t%0, %1"
[(set_attr "type" "vop1")
(set_attr "length" "8")])
diff --git a/gcc/config/gcn/gcn.opt b/gcc/config/gcn/gcn.opt
index 9606aaf0b1a..a3f341f7eb1 100644
--- a/gcc/config/gcn/gcn.opt
+++ b/gcc/config/gcn/gcn.opt
@@ -77,6 +77,13 @@ mgang-private-size=
Target RejectNegative Joined UInteger Var(gang_private_size_opt) Init(-1)
Amount of local data-share (LDS) memory to reserve for gang-private variables.
+mlow-precision-sqrt
+Target Var(flag_mlow_precision_sqrt) Optimization
+Enable the square root approximation for 64bit double precision;
+this reduces precision of square root results to 23 bits for the
+fractional part of the floating-point number.
+It also implies low-precision reciprocal sqrt.
+
Wopenacc-dims
Target Var(warn_openacc_dims) Warning
Warn about invalid OpenACC dimensions.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5c066219a7d..fdd6e41cade 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -20192,6 +20192,17 @@ compiled code must match the device mode. The default is @samp{-mno-xnack}.
At present this option is a placeholder for support that is not yet
implemented.
+@item -mlow-precision-sqrt
+@itemx -mno-low-precision-sqrt
+@opindex mlow-precision-sqrt
+@opindex mno-low-precision-sqrt
+Enable the square root approximation for 64bit double precision;
+this reduces precision of square root results to 23 bits for the
+fractional part of the floating-point number (2@sup{29} ULP).
+It also implies low-precision reciprocal sqrt.
+This option only has an effect if @option{-ffast-math} or
+@option{-funsafe-math-optimizations} is used as well.
+
@end table
@node ARC Options
gcc-13/changes.html - GCN: document -mlow-precision-sqrt
diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index 390193ca..d335eab3 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -179,6 +179,12 @@ a work-in-progress.</p>
<li>Support for the Instinct MI200 series devices (<a
href="https://gcc.gnu.org/onlinedocs/gcc/AMD-GCN-Options.html">
<code>gfx90a</code></a>) has been added.</li>
+ <li>The <code>-mlow-precision-sqrt</code> option (disabled by default)
+ has been added to use the hardware <code>sqrt</code> also for
+ double-precision floating point arguments; note that the result
+ only has much a reduced accurary of 2<sup>29</sup> ULP. This
+ option requires <code>-funsafe-math-optimizations</code>
+ (implied by <code>-ffast-math</code>) in addition.</li>
</ul>
<!-- <h3 id="arc">ARC</h3> -->