https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101929

--- Comment #9 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 7 Mar 2022, crazylht at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101929
> 
> --- Comment #8 from Hongtao.liu <crazylht at gmail dot com> ---
> (In reply to Richard Biener from comment #7)
> > Another change to mute the effect somewhat (but not fixing x264) that was
> > mentioned is
> > 
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index b2bf90576d5..acf2cc977b4 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -22595,7 +22595,7 @@ ix86_builtin_vectorization_cost (enum
> > vect_cost_for_stmt type_of_cost,
> >        case vec_construct:
> >         {
> >           /* N element inserts into SSE vectors.  */
> > -         int cost = TYPE_VECTOR_SUBPARTS (vectype) * ix86_cost->sse_op;
> > +         int cost = (TYPE_VECTOR_SUBPARTS (vectype) - 1) *
> > ix86_cost->sse_op;
> n - 1 is right for 128-bit vector, but for 256-bit vector, shouldn't it be n -
> 2, since we have a separate cost for vinserti128, and n - 4 for 512-bit one.

True!  Note that without SLP the gpr->xmm move cost is not yet accounted
for (for loops the cases where we will need an actual gpr->xmm move
will be restricted to CTORs emitted in the prologue - in-loop cases
will always come from memory, so it might not be too important to get
that correct for the non-SLP case).

Reply via email to