This adds a --param to allow disabling of vectorization of floating point inductions. Ontop of -Ofast this should allow 549.fotonik3d_r to not miscompare.
While I thought of a more elaborate way of disabling certain vectorization kinds (reductions also came to my mind) this for now simply uses a --param than some sophisticated -fvectorize-* scheme. Bootstrapped and tested on x86_64-unknown-linux-gnu. I've verified that 549.fotonik3d_r miscompares with -Ofast -march=znver2 and passes when adding --param vect-induction-float=0 which should be valid at least for peak (but I guess also base for FOPTIMIZE for example). I did not benchmark against other workarounds (it has been said -fno-unsafe-math-optimizations or other similar things work as well). OK for trunk? Thanks, Richard. 2022-03-08 Richard Biener <rguent...@suse.de> PR tree-optimization/84201 * params.opt (-param=vect-induction-float): Add. * doc/invoke.texi (vect-induction-float): Document. * tree-vect-loop.cc (vectorizable_induction): Honor param_vect_induction_float. * gcc.dg/vect/pr84201.c: New testcase. --- gcc/doc/invoke.texi | 3 +++ gcc/params.opt | 4 ++++ gcc/testsuite/gcc.dg/vect/pr84201.c | 22 ++++++++++++++++++++++ gcc/tree-vect-loop.cc | 8 ++++++++ 4 files changed, 37 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/vect/pr84201.c diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index b01ffab566a..a0fa5e1cf43 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -14989,6 +14989,9 @@ in an inner loop relative to the loop being vectorized. The factor applied is the maximum of the estimated number of iterations of the inner loop and this parameter. The default value of this parameter is 50. +@item vect-induction-float +Enable loop vectorization of floating point inductions. + @item avoid-fma-max-bits Maximum number of bits for which we avoid creating FMAs. diff --git a/gcc/params.opt b/gcc/params.opt index f76f7839916..9561aa61a50 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -1176,6 +1176,10 @@ Controls how loop vectorizer uses partial vectors. 0 means never, 1 means only Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) IntegerRange(1, 10000) Param Optimization The maximum factor which the loop vectorizer applies to the cost of statements in an inner loop relative to the loop being vectorized. +-param=vect-induction-float= +Common Joined UInteger Var(param_vect_induction_float) Init(1) IntegerRage(0, 1) Param Optimization +Enable loop vectorization of floating point inductions. + -param=vrp1-mode= Common Joined Var(param_vrp1_mode) Enum(vrp_mode) Init(VRP_MODE_VRP) Param Optimization --param=vrp1-mode=[vrp|ranger] Specifies the mode VRP1 should operate in. diff --git a/gcc/testsuite/gcc.dg/vect/pr84201.c b/gcc/testsuite/gcc.dg/vect/pr84201.c new file mode 100644 index 00000000000..1cc6d1ff13c --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr84201.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-Ofast --param vect-induction-float=0" } */ + +void foo (float *a, float f, float s, int n) +{ + for (int i = 0; i < n; ++i) + { + a[i] = f; + f += s; + } +} + +void bar (double *a, double f, double s, int n) +{ + for (int i = 0; i < n; ++i) + { + a[i] = f; + f += s; + } +} + +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 2 "vect" } } */ diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 1f30fc82ca1..7fcec12a3e9 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -8175,6 +8175,14 @@ vectorizable_induction (loop_vec_info loop_vinfo, return false; } + if (FLOAT_TYPE_P (vectype) && !param_vect_induction_float) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "floating point induction vectorization disabled\n"); + return false; + } + step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (stmt_info); gcc_assert (step_expr != NULL_TREE); tree step_vectype = get_same_sized_vectype (TREE_TYPE (step_expr), vectype); -- 2.34.1