> Hi,
> 
> Can the issue be resolved in a target independent manner as suggested below?
> Or is it better to deal with this in the target code?
> 
> Best regards,
> Oleg Endo
> 
> On Fri, 2024-09-27 at 00:26 -0400, Pietro Monteiro wrote:
> > The prefetch instruction that is emitted by __builtin_prefetch is
> > re-ordered on GCC, but not on clang[0]. GCC's behavior is surprising
> > because when using the builtin you want the instruction to be placed at
> > the exact point where you put it. Moving it around, specially across
> > load/stores, may end up being a pessimization. Adding a blockage
> > instruction before the prefetch prevents the scheduler from moving it.
> > 
> > [0] https://godbolt.org/z/Ycjr7Tq8b
> > 
> > 
> > -- 8< --
> > 
> > 
> > diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> > index 37c7c98e5c..fec751e0d6 100644
> > --- a/gcc/builtins.cc
> > +++ b/gcc/builtins.cc
> > @@ -1329,7 +1329,12 @@ expand_builtin_prefetch (tree exp)
> >        create_integer_operand (&ops[1], INTVAL (op1));
> >        create_integer_operand (&ops[2], INTVAL (op2));
> >        if (maybe_expand_insn (targetm.code_for_prefetch, 3, ops))
> > -   return;
> > +        {
> > +          /* Prevent the prefetch from being moved.  */
> > +          rtx_insn *last = get_last_insn ();
> > +          emit_insn_before (gen_blockage (), last);
> > +          return;
> > +        }
> >      }
> >  
> >    /* Don't do anything with direct references to volatile memory, but

Reply via email to