On Fri, Sep 27, 2024 at 6:27 AM Pietro Monteiro <pie...@sociotechnical.xyz> wrote: > > The prefetch instruction that is emitted by __builtin_prefetch is re-ordered > on GCC, but not on clang[0]. GCC's behavior is surprising because when using > the builtin you want the instruction to be placed at the exact point where > you put it. Moving it around, specially across load/stores, may end up being > a pessimization. Adding a blockage instruction before the prefetch prevents > the scheduler from moving it. > > [0] https://godbolt.org/z/Ycjr7Tq8b
I think the testcase is quite broken (aka not real-world). I would also suggest that a hard scheduling barrier isn't the correct tool (see Olegs response), but instead prefetch should properly model a data dependence so it only blocks code motion for dependent accesses. On the GIMPLE side disambiguation happens solely based on pointer analysis, on RTL where prefetch is likely an UNSPEC I would suggest to model the dependence as a USE/CLOBBER pair of a MEM. Richard. > -- 8< -- > > > diff --git a/gcc/builtins.cc b/gcc/builtins.cc > index 37c7c98e5c..fec751e0d6 100644 > --- a/gcc/builtins.cc > +++ b/gcc/builtins.cc > @@ -1329,7 +1329,12 @@ expand_builtin_prefetch (tree exp) > create_integer_operand (&ops[1], INTVAL (op1)); > create_integer_operand (&ops[2], INTVAL (op2)); > if (maybe_expand_insn (targetm.code_for_prefetch, 3, ops)) > - return; > + { > + /* Prevent the prefetch from being moved. */ > + rtx_insn *last = get_last_insn (); > + emit_insn_before (gen_blockage (), last); > + return; > + } > } > > /* Don't do anything with direct references to volatile memory, but