Richard Biener <rguent...@suse.de> writes: > The following adds a new --param for debugging the vectorizers alignment > peeling by increasing the cost of aligned stores. > > Bootstrap & regtest running on x86_64-unknown-linux-gnu. > > This makes the PR115843 testcase fail again on trunk (but not on the > branch), seemingly uncovering another backend issue. It makes the > testcase get alignment peeling even with the zen4 costs fixed. > > Any objection?
Seems ok to me. Not sure I understand the mechanics of how this makes the testcase get alignment peeling though. I'd assumed increasing the cost of aligned loads & stores would discourage peeling relative to unaligned accesses. Thanks, Richard > > * params.opt (--param=vect-aligned-ldst-cost-bias): New. > * doc/invoke.texi (--param=vect-aligned-ldst-cost-bias): Document. > * tree-vect-stmts.cc (vect_get_store_cost): Honor > param_vect_aligned_ldst_cost_bias. > (vect_get_load_cost): Likewise. > > * gcc.dg/vect/pr115843.c: Use it. > --- > gcc/doc/invoke.texi | 4 ++++ > gcc/params.opt | 4 ++++ > gcc/testsuite/gcc.dg/vect/pr115843.c | 1 + > gcc/tree-vect-stmts.cc | 2 ++ > 4 files changed, 11 insertions(+) > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 1360cae3986..e542cefbb4a 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -16914,6 +16914,10 @@ permit performing redundancy elimination after > reload. > The maximum number of insns in loop header duplicated > by the copy loop headers pass. > > +@item vect-aligned-ldst-cost-bias > +Bias to apply to the cost of aligned loads and stores. This > +is useful for debugging only. > + > @item vect-epilogues-nomask > Enable loop epilogue vectorization using smaller vector size. > > diff --git a/gcc/params.opt b/gcc/params.opt > index 3c4369fa052..5f86d564421 100644 > --- a/gcc/params.opt > +++ b/gcc/params.opt > @@ -1166,6 +1166,10 @@ Use direct poisoning/unpoisoning instructions for > variables smaller or equal to > Common Joined UInteger Var(param_use_canonical_types) Init(1) > IntegerRange(0, 1) Param > Whether to use canonical types. > > +-param=vect-aligned-ldst-cost-bias= > +Common Joined UInteger Var(param_vect_aligned_ldst_cost_bias) Init(0) Param > Optimization > +Bias to apply to the cost of aligned loads and stores. > + > -param=vect-epilogues-nomask= > Common Joined UInteger Var(param_vect_epilogues_nomask) Init(1) > IntegerRange(0, 1) Param Optimization > Enable loop epilogue vectorization using smaller vector size. > diff --git a/gcc/testsuite/gcc.dg/vect/pr115843.c > b/gcc/testsuite/gcc.dg/vect/pr115843.c > index 1b3fe277209..6701fa3499a 100644 > --- a/gcc/testsuite/gcc.dg/vect/pr115843.c > +++ b/gcc/testsuite/gcc.dg/vect/pr115843.c > @@ -1,3 +1,4 @@ > +/* { dg-additional-options "--param vect-aligned-ldst-cost-bias=100" } */ > /* { dg-additional-options "-mavx512vl --param vect-partial-vector-usage=2" > { target { avx512f_runtime && avx512vl } } } */ > > #include "tree-vect.h" > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index fc02e84b4b4..2502dbd5413 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -997,6 +997,7 @@ vect_get_store_cost (vec_info *, stmt_vec_info stmt_info, > int ncopies, > *inside_cost += record_stmt_cost (body_cost_vec, ncopies, > vector_store, stmt_info, 0, > vect_body); > + *inside_cost += param_vect_aligned_ldst_cost_bias * ncopies; > > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, > @@ -1049,6 +1050,7 @@ vect_get_load_cost (vec_info *, stmt_vec_info > stmt_info, int ncopies, > { > *inside_cost += record_stmt_cost (body_cost_vec, ncopies, vector_load, > stmt_info, 0, vect_body); > + *inside_cost += param_vect_aligned_ldst_cost_bias * ncopies; > > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location,