On Mon, Sep 22, 2025 at 4:22 AM liuhongt <[email protected]> wrote:
>
> Since it regressed SPEC performance(Refer to PR121994), I guess
> it's related to register pressure and can be tuned by adjusting
> reduc_lat_mult_thr. I don't have Zen2 machine, so for simplity, I'll
> just disable unroll in vectorizer for Zen2.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ready push to trunk.
Please also update znver1_cost.
I'll note that given the reduc_lat_mult_thr tunings are not mode dependent
both znver1 (split AVX pipeline) and znver4 (split AVX512 pipeline) throughput
numbers are not correctly modeled in all cases. This could be possibly
mitigated in ix86_vector_costs::finish_cost by dividing by two when
loop_vinfo->vector_mode is of affected size.
> gcc/ChangeLog:
>
> PR target/121994
> * config/i386/x86-tune-costs.h (znver2_cost): Set
> vect_unroll_limit to 1.
> ---
> gcc/config/i386/x86-tune-costs.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/i386/x86-tune-costs.h
> b/gcc/config/i386/x86-tune-costs.h
> index 1649ea2fe3e..e9e4ddf108a 100644
> --- a/gcc/config/i386/x86-tune-costs.h
> +++ b/gcc/config/i386/x86-tune-costs.h
> @@ -1918,7 +1918,7 @@ struct processor_costs znver2_cost = {
> FMA/DOT_PROD_EXPR/SAD_EXPR,
> it's used to determine unroll
> factor in the vectorizer. */
> - 4, /* Limit how much the autovectorizer
> + 1, /* Limit how much the autovectorizer
> may unroll a loop. */
> znver2_memcpy,
> znver2_memset,
> --
> 2.34.1
>