https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71414
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |crazylht at gmail dot com --- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> --- The target now has the ability to tell the vectorizer to choose a larger VF based on the cost info it got for the default VF, so the x86 backend could make use of that. For example with the following patch we'll unroll the vectorized loops 4 times (of course the actual check for small reduction loops and a register pressure estimate is missing). That generates .L4: vaddps (%rax), %zmm1, %zmm1 vaddps 64(%rax), %zmm2, %zmm2 addq $256, %rax vaddps -128(%rax), %zmm0, %zmm0 vaddps -64(%rax), %zmm3, %zmm3 cmpq %rcx, %rax jne .L4 movq %rdx, %rax andq $-64, %rax vaddps %zmm3, %zmm0, %zmm0 vaddps %zmm2, %zmm1, %zmm1 vaddps %zmm1, %zmm0, %zmm1 ... more epilog ... with -march=znver4 on current trunk. diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index d4ff56ee8dd..53c09bb9d9c 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -23615,8 +23615,18 @@ class ix86_vector_costs : public vector_costs stmt_vec_info stmt_info, slp_tree node, tree vectype, int misalign, vect_cost_model_location where) override; + void finish_cost (const vector_costs *uncast_scalar_costs); }; +void +ix86_vector_costs::finish_cost (const vector_costs *uncast_scalar_costs) +{ + auto *scalar_costs + = static_cast<const ix86_vector_costs *> (uncast_scalar_costs); + m_suggested_unroll_factor = 4; + vector_costs::finish_cost (scalar_costs); +} + /* Implement targetm.vectorize.create_costs. */ static vector_costs *