https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941

--- Comment #39 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to rguent...@suse.de from comment #38)
> On Wed, 30 Jul 2025, hjl.tools at gmail dot com wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941
> > 
> > --- Comment #37 from H.J. Lu <hjl.tools at gmail dot com> ---
> > (In reply to Richard Biener from comment #35)
> > > (In reply to H.J. Lu from comment #33)
> > > > Created attachment 61995 [details]
> > > > An updated patch
> > > > 
> > > > Please try this.
> > > 
> > > Looking at the patch I do wonder about
> > > 
> > > static void 
> > > ix86_place_single_vector_set (rtx dest, rtx src, bitmap bbs,
> > >                               rtx inner_scalar = nullptr)
> > > {                        
> > >   basic_block bb = nearest_common_dominator_for_set (CDI_DOMINATORS, bbs);
> > >   while (bb->loop_father->latch              
> > >          != EXIT_BLOCK_PTR_FOR_FN (cfun))
> > >     bb = get_immediate_dominator (CDI_DOMINATORS,
> > >                                   bb->loop_father->header);
> > > 
> > > when the nearest common dominator is a BB in a loop nest like
> > > 
> > >  loop {
> > >    loop {
> > >    }
> > > 
> > >    loop {
> > >       BB;
> > >    }
> > >    BB';
> > >  }
> > > 
> > > this will skip an arbitrary number of earlier sibling loops.  I think
> > > if we want to do such additional hoisting at all - for a splat of a
> > > non-constant we have to ensure the set of the source we splat is still
> > > dominating the insertion point (where's that done?) - it IMO only
> > > makes sense (without extra costing) to hoist the set out of a perfect
> > > nest, thus never across earlier sibling loops.  Even for BB' this is
> > > likely problematic.
> > 
> > Since my patch works, I'd like to keep it as is.  Will it work for you?
> 
> Sure.  Where is it ensured the splat isn't inserted before it is set?

With my patch, we got

 basic_block bb = nearest_common_dominator_for_set (CDI_DOMINATORS, bbs);
  /* For X86_CSE_VEC_DUP, don't place the vector set outside of the loop
     to avoid extra spills.  */
  if (!load || load->kind != X86_CSE_VEC_DUP)
    {    
      while (bb->loop_father->latch
             != EXIT_BLOCK_PTR_FOR_FN (cfun))
        bb = get_immediate_dominator (CDI_DOMINATORS,
                                      bb->loop_father->header);
    }    

When load->kind == X86_CSE_VEC_DUP, bb is the nearest common dominator which
may be inside the loop.

Reply via email to