On Fri, Mar 11, 2022 at 8:43 PM Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > After accounting for GPR -> XMM move cost for vec_construct the > base cost needs adjustments to not double-cost those. This also > lowers the cost when such move is not necessary. > > This fixes the observed 538.imagick_r and 525.x264_r regressions > for me on Zen2 with -Ofast -march=native. > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > OK for trunk? LGTM. > > Thanks, > Richard. > > 2022-03-11 Richard Biener <rguent...@suse.de> > > PR target/104762 > * config/i386/i386.cc (ix86_builtin_vectorization_cost): Do not > cost the first lane of SSE pieces as inserts for vec_construct. > --- > gcc/config/i386/i386.cc | 17 +++++++++++------ > 1 file changed, 11 insertions(+), 6 deletions(-) > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index 4121f986221..23bedea92bd 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -22597,16 +22597,21 @@ ix86_builtin_vectorization_cost (enum > vect_cost_for_stmt type_of_cost, > > case vec_construct: > { > - /* N element inserts into SSE vectors. */ > - int cost = TYPE_VECTOR_SUBPARTS (vectype) * ix86_cost->sse_op; > + int n = TYPE_VECTOR_SUBPARTS (vectype); > + /* N - 1 element inserts into an SSE vector, the possible > + GPR -> XMM move is accounted for in add_stmt_cost. */ > + if (GET_MODE_BITSIZE (mode) <= 128) > + return (n - 1) * ix86_cost->sse_op; > /* One vinserti128 for combining two SSE vectors for AVX256. */ > - if (GET_MODE_BITSIZE (mode) == 256) > - cost += ix86_vec_cost (mode, ix86_cost->addss); > + else if (GET_MODE_BITSIZE (mode) == 256) > + return ((n - 2) * ix86_cost->sse_op > + + ix86_vec_cost (mode, ix86_cost->addss)); > /* One vinserti64x4 and two vinserti128 for combining SSE > and AVX256 vectors to AVX512. */ > else if (GET_MODE_BITSIZE (mode) == 512) > - cost += 3 * ix86_vec_cost (mode, ix86_cost->addss); > - return cost; > + return ((n - 4) * ix86_cost->sse_op > + + 3 * ix86_vec_cost (mode, ix86_cost->addss)); > + gcc_unreachable (); > } > > default: > -- > 2.34.1
-- BR, Hongtao