https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101929
--- Comment #9 from rguenther at suse dot de <rguenther at suse dot de> --- On Mon, 7 Mar 2022, crazylht at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101929 > > --- Comment #8 from Hongtao.liu <crazylht at gmail dot com> --- > (In reply to Richard Biener from comment #7) > > Another change to mute the effect somewhat (but not fixing x264) that was > > mentioned is > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > index b2bf90576d5..acf2cc977b4 100644 > > --- a/gcc/config/i386/i386.cc > > +++ b/gcc/config/i386/i386.cc > > @@ -22595,7 +22595,7 @@ ix86_builtin_vectorization_cost (enum > > vect_cost_for_stmt type_of_cost, > > case vec_construct: > > { > > /* N element inserts into SSE vectors. */ > > - int cost = TYPE_VECTOR_SUBPARTS (vectype) * ix86_cost->sse_op; > > + int cost = (TYPE_VECTOR_SUBPARTS (vectype) - 1) * > > ix86_cost->sse_op; > n - 1 is right for 128-bit vector, but for 256-bit vector, shouldn't it be n - > 2, since we have a separate cost for vinserti128, and n - 4 for 512-bit one. True! Note that without SLP the gpr->xmm move cost is not yet accounted for (for loops the cases where we will need an actual gpr->xmm move will be restricted to CTORs emitted in the prologue - in-loop cases will always come from memory, so it might not be too important to get that correct for the non-SLP case).