On Fri, Sep 27, 2024 at 6:27 AM Pietro Monteiro
<pie...@sociotechnical.xyz> wrote:
>
> The prefetch instruction that is emitted by __builtin_prefetch is re-ordered 
> on GCC, but not on clang[0]. GCC's behavior is surprising because when using 
> the builtin you want the instruction to be placed at the exact point where 
> you put it. Moving it around, specially across load/stores, may end up being 
> a pessimization. Adding a blockage instruction before the prefetch prevents 
> the scheduler from moving it.
>
> [0] https://godbolt.org/z/Ycjr7Tq8b

I think the testcase is quite broken (aka not real-world).  I would also suggest
that a hard scheduling barrier isn't the correct tool (see Olegs response), but
instead prefetch should properly model a data dependence so it only blocks
code motion for dependent accesses.  On the GIMPLE side disambiguation
happens solely based on pointer analysis, on RTL where prefetch is likely
an UNSPEC I would suggest to model the dependence as a USE/CLOBBER
pair of a MEM.

Richard.

> -- 8< --
>
>
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index 37c7c98e5c..fec751e0d6 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -1329,7 +1329,12 @@ expand_builtin_prefetch (tree exp)
>        create_integer_operand (&ops[1], INTVAL (op1));
>        create_integer_operand (&ops[2], INTVAL (op2));
>        if (maybe_expand_insn (targetm.code_for_prefetch, 3, ops))
> -       return;
> +        {
> +          /* Prevent the prefetch from being moved.  */
> +          rtx_insn *last = get_last_insn ();
> +          emit_insn_before (gen_blockage (), last);
> +          return;
> +        }
>      }
>
>    /* Don't do anything with direct references to volatile memory, but

Reply via email to