> Am 15.07.2024 um 19:08 schrieb Richard Sandiford <richard.sandif...@arm.com>:
>
> Richard Biener <rguent...@suse.de> writes:
>> The following adds a new --param for debugging the vectorizers alignment
>> peeling by increasing the cost of aligned stores.
>>
>> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>>
>> This makes the PR115843 testcase fail again on trunk (but not on the
>> branch), seemingly uncovering another backend issue. It makes the
>> testcase get alignment peeling even with the zen4 costs fixed.
>>
>> Any objection?
>
> Seems ok to me. Not sure I understand the mechanics of how this makes
> the testcase get alignment peeling though. I'd assumed increasing the
> cost of aligned loads & stores would discourage peeling relative to
> unaligned accesses.
I guess it simulates the bug in the x86 backend where unaligned loads are
cheaper than aligned ones. Unfortunately params are not signed integers so we
can only bias with a positive value. I also chose to bias aligned accesses
because for unaligned cost depends on the actual misalignment and the exact
mode of operation.
Richard
> Thanks,
> Richard
>
>>
>> * params.opt (--param=vect-aligned-ldst-cost-bias): New.
>> * doc/invoke.texi (--param=vect-aligned-ldst-cost-bias): Document.
>> * tree-vect-stmts.cc (vect_get_store_cost): Honor
>> param_vect_aligned_ldst_cost_bias.
>> (vect_get_load_cost): Likewise.
>>
>> * gcc.dg/vect/pr115843.c: Use it.
>> ---
>> gcc/doc/invoke.texi | 4 ++++
>> gcc/params.opt | 4 ++++
>> gcc/testsuite/gcc.dg/vect/pr115843.c | 1 +
>> gcc/tree-vect-stmts.cc | 2 ++
>> 4 files changed, 11 insertions(+)
>>
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 1360cae3986..e542cefbb4a 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -16914,6 +16914,10 @@ permit performing redundancy elimination after
>> reload.
>> The maximum number of insns in loop header duplicated
>> by the copy loop headers pass.
>>
>> +@item vect-aligned-ldst-cost-bias
>> +Bias to apply to the cost of aligned loads and stores. This
>> +is useful for debugging only.
>> +
>> @item vect-epilogues-nomask
>> Enable loop epilogue vectorization using smaller vector size.
>>
>> diff --git a/gcc/params.opt b/gcc/params.opt
>> index 3c4369fa052..5f86d564421 100644
>> --- a/gcc/params.opt
>> +++ b/gcc/params.opt
>> @@ -1166,6 +1166,10 @@ Use direct poisoning/unpoisoning instructions for
>> variables smaller or equal to
>> Common Joined UInteger Var(param_use_canonical_types) Init(1)
>> IntegerRange(0, 1) Param
>> Whether to use canonical types.
>>
>> +-param=vect-aligned-ldst-cost-bias=
>> +Common Joined UInteger Var(param_vect_aligned_ldst_cost_bias) Init(0) Param
>> Optimization
>> +Bias to apply to the cost of aligned loads and stores.
>> +
>> -param=vect-epilogues-nomask=
>> Common Joined UInteger Var(param_vect_epilogues_nomask) Init(1)
>> IntegerRange(0, 1) Param Optimization
>> Enable loop epilogue vectorization using smaller vector size.
>> diff --git a/gcc/testsuite/gcc.dg/vect/pr115843.c
>> b/gcc/testsuite/gcc.dg/vect/pr115843.c
>> index 1b3fe277209..6701fa3499a 100644
>> --- a/gcc/testsuite/gcc.dg/vect/pr115843.c
>> +++ b/gcc/testsuite/gcc.dg/vect/pr115843.c
>> @@ -1,3 +1,4 @@
>> +/* { dg-additional-options "--param vect-aligned-ldst-cost-bias=100" } */
>> /* { dg-additional-options "-mavx512vl --param vect-partial-vector-usage=2"
>> { target { avx512f_runtime && avx512vl } } } */
>>
>> #include "tree-vect.h"
>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>> index fc02e84b4b4..2502dbd5413 100644
>> --- a/gcc/tree-vect-stmts.cc
>> +++ b/gcc/tree-vect-stmts.cc
>> @@ -997,6 +997,7 @@ vect_get_store_cost (vec_info *, stmt_vec_info
>> stmt_info, int ncopies,
>> *inside_cost += record_stmt_cost (body_cost_vec, ncopies,
>> vector_store, stmt_info, 0,
>> vect_body);
>> + *inside_cost += param_vect_aligned_ldst_cost_bias * ncopies;
>>
>> if (dump_enabled_p ())
>> dump_printf_loc (MSG_NOTE, vect_location,
>> @@ -1049,6 +1050,7 @@ vect_get_load_cost (vec_info *, stmt_vec_info
>> stmt_info, int ncopies,
>> {
>> *inside_cost += record_stmt_cost (body_cost_vec, ncopies, vector_load,
>> stmt_info, 0, vect_body);
>> + *inside_cost += param_vect_aligned_ldst_cost_bias * ncopies;
>>
>> if (dump_enabled_p ())
>> dump_printf_loc (MSG_NOTE, vect_location,