[PATCH] -fftz-math: assume that denorms _must_ be flushed to zero optimizations

Pekka Jääskeläinen Thu, 10 Aug 2017 09:40:05 -0700

Hi,

The attached patch adds a new switch -fftz-math which makes certain
optimizations
assume that "flush to zero" behavior of denormal inputs and outputs is
not an optimization
hint, but required behavior for semantical correctness.


The need for this was initiated by HSAIL (BRIG). With HSAIL, flush to
zero handling is required,
(not only "allowed") in case an HSAIL instruction is marked with the
'ftz' modifier (all HSA Base
profile instructions are).

The patch is not complete and likely misses many optimizations.
However, it is a starting point
that fixes a few cases brought out by the HSAIL conformance suite. We
plan to extend this
as new cases come up.

OK for trunk?

BR,
Pekka

Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 251026)
+++ gcc/common.opt	(working copy)
@@ -2281,6 +2281,11 @@
 Common Report Var(flag_single_precision_constant) Optimization
 Convert floating point constants to single precision constants.
 
+fftz-math
+Common Report Var(flag_ftz_math) Optimization
+Optimizations handle floating-point operations as they must flush
+subnormal floating-point values to zero.
+
 fsplit-ivs-in-unroller
 Common Report Var(flag_split_ivs_in_unroller) Init(1) Optimization
 Split lifetimes of induction variables when loops are unrolled.
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 251026)
+++ gcc/doc/invoke.texi	(working copy)
@@ -9458,6 +9458,17 @@
 This option is experimental and does not currently guarantee to
 disable all GCC optimizations that affect signaling NaN behavior.
 
+@item -fftz-math
+@opindex ftz-math
+This option is experimental. With this flag on GCC treats
+floating-point operations (except abs, class, copysign and neg) as
+they must flush subnormal input operands and results to zero
+(FTZ). The FTZ rules are derived from HSA Programmers Reference Manual
+for the base profile. This alters optimizations that would break the
+rules, for example X * 1 -> X simplification. The option assumes the
+target supports FTZ in hardware and has it enabled - either by default
+or set by the user.
+
 @item -fno-fp-int-builtin-inexact
 @opindex fno-fp-int-builtin-inexact
 Do not allow the built-in functions @code{ceil}, @code{floor},
Index: gcc/fold-const-call.c
===================================================================
--- gcc/fold-const-call.c	(revision 251026)
+++ gcc/fold-const-call.c	(working copy)
@@ -697,7 +697,7 @@
 	      && do_mpfr_arg1 (result, mpfr_y1, arg, format));
 
     CASE_CFN_FLOOR:
-      if (!REAL_VALUE_ISNAN (*arg) || !flag_errno_math)
+      if ((!REAL_VALUE_ISNAN (*arg) || !flag_errno_math) && !flag_ftz_math)
 	{
 	  real_floor (result, format, arg);
 	  return true;
@@ -705,7 +705,7 @@
       return false;
 
     CASE_CFN_CEIL:
-      if (!REAL_VALUE_ISNAN (*arg) || !flag_errno_math)
+      if ((!REAL_VALUE_ISNAN (*arg) || !flag_errno_math) && !flag_ftz_math)
 	{
 	  real_ceil (result, format, arg);
 	  return true;
Index: gcc/match.pd
===================================================================
--- gcc/match.pd	(revision 251026)
+++ gcc/match.pd	(working copy)
@@ -143,6 +143,7 @@
 (simplify
  (mult @0 real_onep)
  (if (!HONOR_SNANS (type)
+      && !flag_ftz_math
       && (!HONOR_SIGNED_ZEROS (type)
           || !COMPLEX_FLOAT_TYPE_P (type)))
   (non_lvalue @0)))
@@ -151,6 +152,7 @@
 (simplify
  (mult @0 real_minus_onep)
   (if (!HONOR_SNANS (type)
+       && !flag_ftz_math
        && (!HONOR_SIGNED_ZEROS (type)
            || !COMPLEX_FLOAT_TYPE_P (type)))
    (negate @0)))
@@ -332,13 +334,13 @@
 /* In IEEE floating point, x/1 is not equivalent to x for snans.  */
 (simplify
  (rdiv @0 real_onep)
- (if (!HONOR_SNANS (type))
+ (if (!HONOR_SNANS (type) && !flag_ftz_math)
   (non_lvalue @0)))
 
 /* In IEEE floating point, x/-1 is not equivalent to -x for snans.  */
 (simplify
  (rdiv @0 real_minus_onep)
- (if (!HONOR_SNANS (type))
+ (if (!HONOR_SNANS (type) && !flag_ftz_math)
   (negate @0)))
 
 (if (flag_reciprocal_math)
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	(revision 251026)
+++ gcc/simplify-rtx.c	(working copy)
@@ -2565,8 +2565,10 @@
 	return op1;
 
       /* In IEEE floating point, x*1 is not equivalent to x for
-	 signalling NaNs.  */
+	 signalling NaNs.
+	 For -fftz-math, x*1 is not equivalent to x for subnormals. */
       if (!HONOR_SNANS (mode)
+	  && (FLOAT_MODE_P (mode) && !flag_ftz_math)
 	  && trueop1 == CONST1_RTX (mode))
 	return op0;

[PATCH] -fftz-math: assume that denorms _must_ be flushed to zero optimizations

Reply via email to