https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582
--- Comment #12 from rguenther at suse dot de <rguenther at suse dot de> --- On Fri, 18 Feb 2022, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582 > > --- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > True. > So another option is to try to undo some of those short vectorization cases > during isel, expansion or later, though e.g. for the negdi2 case it will go > already during expansion into memory. Yes, there are duplicates about this issue and it's really hard to solve generally. There's the possibility to try improving on the costing side but currently the cost hooks just see ix86_vector_costs::add_stmt_cost (this=0x41b88c0, count=1, kind=vec_construct, stmt_info=0x0, vectype=<vector_type 0x7ffff667a888>, misalign=0, where=vect_prologue) so they have no idea about the feeding stmts. The cost entry is generated by vect_prologue_cost_for_slp which knows the scalar operands but we do not pass the SLP node down to the cost hooks (that's something on my list but my idea was to push it back when we only have SLP nodes and thus could go w/o the stmt_info then). The other possibility is (for the original testcase) to anticipate that RTL expansion will expand 'w' to a TImode register and take that as a reason to pessimize vectorization (but we don't know how it's going to be used, so that's probably a flawed attempt). The only short-term fixes are a) biasing the costing, regressing the from memory case, b) pass down the SLP node where available and look at the defs of the CTOR components, costing a gpr->xmm move where it can be anticipated. b) is more future-proof, if we'd take that at this point I can see how intrusive it would be.