https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298
--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Jan Hubicka from comment #14) > > Counting latencies, I think vinserti64x2 is 1 cycle and vpinst is > integer->sse move that is slower and set to 4 cycles. > Overall it is wrong that we use addss cost to estimate vec_construct: > > case vec_construct: > { > int n = TYPE_VECTOR_SUBPARTS (vectype); > /* N - 1 element inserts into an SSE vector, the possible > GPR -> XMM move is accounted for in add_stmt_cost. */ > if (GET_MODE_BITSIZE (mode) <= 128) > return (n - 1) * ix86_cost->sse_op; > /* One vinserti128 for combining two SSE vectors for AVX256. */ > else if (GET_MODE_BITSIZE (mode) == 256) > return ((n - 2) * ix86_cost->sse_op > + ix86_vec_cost (mode, ix86_cost->addss)); > /* One vinserti64x4 and two vinserti128 for combining SSE > and AVX256 vectors to AVX512. */ > else if (GET_MODE_BITSIZE (mode) == 512) > return ((n - 4) * ix86_cost->sse_op > + 3 * ix86_vec_cost (mode, ix86_cost->addss)); > gcc_unreachable (); > } > > I think we may want to have ix86_cost->hard_register->integer_to_sse to cost > the construction in integer modes instead of addss? I have no recollection on why we are mixing sse_op and addss cost here ... It's not a integer to SSE conversion either (again the caller adjusts for this in this case). We seem to use sse_op for the element insert into SSE reg and addss for the insert of SSE regs into YMM or ZMM. I think it's reasonable to change this to consistently use sse_op.