https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941
--- Comment #38 from rguenther at suse dot de <rguenther at suse dot de> --- On Wed, 30 Jul 2025, hjl.tools at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941 > > --- Comment #37 from H.J. Lu <hjl.tools at gmail dot com> --- > (In reply to Richard Biener from comment #35) > > (In reply to H.J. Lu from comment #33) > > > Created attachment 61995 [details] > > > An updated patch > > > > > > Please try this. > > > > Looking at the patch I do wonder about > > > > static void > > ix86_place_single_vector_set (rtx dest, rtx src, bitmap bbs, > > rtx inner_scalar = nullptr) > > { > > basic_block bb = nearest_common_dominator_for_set (CDI_DOMINATORS, bbs); > > while (bb->loop_father->latch > > != EXIT_BLOCK_PTR_FOR_FN (cfun)) > > bb = get_immediate_dominator (CDI_DOMINATORS, > > bb->loop_father->header); > > > > when the nearest common dominator is a BB in a loop nest like > > > > loop { > > loop { > > } > > > > loop { > > BB; > > } > > BB'; > > } > > > > this will skip an arbitrary number of earlier sibling loops. I think > > if we want to do such additional hoisting at all - for a splat of a > > non-constant we have to ensure the set of the source we splat is still > > dominating the insertion point (where's that done?) - it IMO only > > makes sense (without extra costing) to hoist the set out of a perfect > > nest, thus never across earlier sibling loops. Even for BB' this is > > likely problematic. > > Since my patch works, I'd like to keep it as is. Will it work for you? Sure. Where is it ensured the splat isn't inserted before it is set?