https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71414
--- Comment #14 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Richard Biener from comment #13) > The target now has the ability to tell the vectorizer to choose a larger VF > based on the cost info it got for the default VF, so the x86 backend could > make use of that. For example with the following patch we'll unroll the > vectorized loops 4 times (of course the actual check for small reduction > loops and a register pressure estimate is missing). That generates > > .L4: > vaddps (%rax), %zmm1, %zmm1 > vaddps 64(%rax), %zmm2, %zmm2 > addq $256, %rax > vaddps -128(%rax), %zmm0, %zmm0 > vaddps -64(%rax), %zmm3, %zmm3 > cmpq %rcx, %rax > jne .L4 > movq %rdx, %rax > andq $-64, %rax > vaddps %zmm3, %zmm0, %zmm0 > vaddps %zmm2, %zmm1, %zmm1 > vaddps %zmm1, %zmm0, %zmm1 > ... more epilog ... > > with -march=znver4 on current trunk. > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index d4ff56ee8dd..53c09bb9d9c 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -23615,8 +23615,18 @@ class ix86_vector_costs : public vector_costs > stmt_vec_info stmt_info, slp_tree node, > tree vectype, int misalign, > vect_cost_model_location where) override; > + void finish_cost (const vector_costs *uncast_scalar_costs); > }; > > +void > +ix86_vector_costs::finish_cost (const vector_costs *uncast_scalar_costs) > +{ > + auto *scalar_costs > + = static_cast<const ix86_vector_costs *> (uncast_scalar_costs); > + m_suggested_unroll_factor = 4; > + vector_costs::finish_cost (scalar_costs); I remember we have posted an patch for that https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604186.html One regression observed is the VF of epilog loop will increase(from xmm to ymm) after unroll the vectorized loops, and it regressed performance for lower-tripcount loop(similar as -mprefer-vector-width=512). Also for the case in the PR, I'm trying to enable -fvariable-expansion-in-unroller when -funroll-loops, and the partial sum will break reduction chain.