https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941
--- Comment #39 from H.J. Lu <hjl.tools at gmail dot com> --- (In reply to rguent...@suse.de from comment #38) > On Wed, 30 Jul 2025, hjl.tools at gmail dot com wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941 > > > > --- Comment #37 from H.J. Lu <hjl.tools at gmail dot com> --- > > (In reply to Richard Biener from comment #35) > > > (In reply to H.J. Lu from comment #33) > > > > Created attachment 61995 [details] > > > > An updated patch > > > > > > > > Please try this. > > > > > > Looking at the patch I do wonder about > > > > > > static void > > > ix86_place_single_vector_set (rtx dest, rtx src, bitmap bbs, > > > rtx inner_scalar = nullptr) > > > { > > > basic_block bb = nearest_common_dominator_for_set (CDI_DOMINATORS, bbs); > > > while (bb->loop_father->latch > > > != EXIT_BLOCK_PTR_FOR_FN (cfun)) > > > bb = get_immediate_dominator (CDI_DOMINATORS, > > > bb->loop_father->header); > > > > > > when the nearest common dominator is a BB in a loop nest like > > > > > > loop { > > > loop { > > > } > > > > > > loop { > > > BB; > > > } > > > BB'; > > > } > > > > > > this will skip an arbitrary number of earlier sibling loops. I think > > > if we want to do such additional hoisting at all - for a splat of a > > > non-constant we have to ensure the set of the source we splat is still > > > dominating the insertion point (where's that done?) - it IMO only > > > makes sense (without extra costing) to hoist the set out of a perfect > > > nest, thus never across earlier sibling loops. Even for BB' this is > > > likely problematic. > > > > Since my patch works, I'd like to keep it as is. Will it work for you? > > Sure. Where is it ensured the splat isn't inserted before it is set? With my patch, we got basic_block bb = nearest_common_dominator_for_set (CDI_DOMINATORS, bbs); /* For X86_CSE_VEC_DUP, don't place the vector set outside of the loop to avoid extra spills. */ if (!load || load->kind != X86_CSE_VEC_DUP) { while (bb->loop_father->latch != EXIT_BLOCK_PTR_FOR_FN (cfun)) bb = get_immediate_dominator (CDI_DOMINATORS, bb->loop_father->header); } When load->kind == X86_CSE_VEC_DUP, bb is the nearest common dominator which may be inside the loop.