On 10/7/24 10:52, Richard Biener wrote:
On Wed, Oct 2, 2024 at 6:26 PM Victor Do Nascimento
<victor.donascime...@arm.com> wrote:
Given the categorization of math built-in functions as `ECF_CONST',
when if-converting their uses, their calls are not masked and are thus
called with an all-true predicate.
This, however, is not appropriate where built-ins have library
equivalents, wherein they may exhibit highly architecture-specific
behaviors. For example, vectorized implementations may delegate the
computation of values outside a certain acceptable numerical range to
special (non-vectorized) routines which considerably slow down
computation.
As numerical simulation programs often do bounds check on input values
prior to math calls, conditionally assigning default output values for
out-of-bounds input and skipping the math call altogether, these
fallback implementations should seldom be called in the execution of
vectorized code. If, however, we don't apply any masking to these
math functions, we end up effectively executing both if and else
branches for these values, leading to considerable performance
degradation on scientific workloads.
We therefore invert the order of handling of math function calls in
`if_convertible_stmt_p' to prioritize the handling of their
library-provided implementations over the equivalent internal function.
Regression tested on aarch64-none-linux-gnu & x86_64-linux-gnu w/ no
new regressions.
I think the patch is good - note I think there's even a bugzilla about this
behavior. So as incremental improvement the patch is OK.
Thanks, this will be particularly useful in allowing us to vectorize
more math code without the worry of the trapping performance penalty,
unblocking some patches that were effectively put on hold pending
some solution to issue.
I think we should further improve this, possibly with profile data, and
the situation could be improved by handling the calls like mask stores
where we put a if (mask != 0) before the store.
Hmmm... Interesting that this behavior has already made it into
Bugzilla. I'm guessing you're referring to `Bug 65425 - code
optimization leads to spurious FP exception'.
I will look into how we may refine the handling of the issue with the
aim of submitting a follow-up patch, as per your suggestion.
Many thanks,
Victor.
Richard.
gcc/ChangeLog:
* tree-if-conv.cc (if_convertible_stmt_p): Check for explicit
function declaration before IFN fallback.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-fncall-mask-math.c: New.
---
.../gcc.dg/vect/vect-fncall-mask-math.c | 33 +++++++++++++++++++
gcc/tree-if-conv.cc | 18 +++++-----
2 files changed, 42 insertions(+), 9 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
diff --git a/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
new file mode 100644
index 00000000000..15e22da2807
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
@@ -0,0 +1,33 @@
+/* Test the correct application of masking to autovectorized math function
calls.
+ Test is currently set to xfail pending the release of the relevant lmvec
+ support. */
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-additional-options "-march=armv8.2-a+sve -fdump-tree-ifcvt-raw -Ofast"
{ target { aarch64*-*-* } } } */
+
+#include <math.h>
+
+const int N = 20;
+const float lim = 101.0;
+const float cst = -1.0;
+float tot = 0.0;
+
+float b[20];
+float a[20] = { [0 ... 9] = 1.7014118e39, /* If branch. */
+ [10 ... 19] = 100.0 }; /* Else branch. */
+
+int main (void)
+{
+ #pragma omp simd
+ for (int i = 0; i < N; i += 1)
+ {
+ if (a[i] > lim)
+ b[i] = cst;
+ else
+ b[i] = expf (a[i]);
+ tot += b[i];
+ }
+ return (0);
+}
+
+/* { dg-final { scan-tree-dump-not { gimple_call <expf, _2, _1>} ifcvt { xfail
{ aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump { gimple_call <.MASK_CALL, _2, expf, _1, _30>}
ifcvt { xfail { aarch64*-*-* } } } } */
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 3b04d1e8d34..90c754a4814 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1133,15 +1133,6 @@ if_convertible_stmt_p (gimple *stmt,
vec<data_reference_p> refs)
case GIMPLE_CALL:
{
- /* There are some IFN_s that are used to replace builtins but have the
- same semantics. Even if MASK_CALL cannot handle them vectorable_call
- will insert the proper selection, so do not block conversion. */
- int flags = gimple_call_flags (stmt);
- if ((flags & ECF_CONST)
- && !(flags & ECF_LOOPING_CONST_OR_PURE)
- && gimple_call_combined_fn (stmt) != CFN_LAST)
- return true;
-
tree fndecl = gimple_call_fndecl (stmt);
if (fndecl)
{
@@ -1160,6 +1151,15 @@ if_convertible_stmt_p (gimple *stmt,
vec<data_reference_p> refs)
}
}
+ /* There are some IFN_s that are used to replace builtins but have the
+ same semantics. Even if MASK_CALL cannot handle them vectorable_call
+ will insert the proper selection, so do not block conversion. */
+ int flags = gimple_call_flags (stmt);
+ if ((flags & ECF_CONST)
+ && !(flags & ECF_LOOPING_CONST_OR_PURE)
+ && gimple_call_combined_fn (stmt) != CFN_LAST)
+ return true;
+
return false;
}
--
2.34.1