On Tue, Jun 11, 2024 at 11:46 AM Victor Do Nascimento
<victor.donascime...@arm.com> wrote:
>
> At present the autovectorizer fails to vectorize simple loops
> involving calls to `__builtin_prefetch'.  A simple example of such
> loop is given below:
>
> void foo(double * restrict a, double * restrict b, int n){
>   int i;
>   for(i=0; i<n; ++i){
>     a[i] = a[i] + b[i];
>     __builtin_prefetch(&(b[i+8]));
>   }
> }
>
> The failure stems from two issues:
>
> 1. Given that it is typically not possible to fully reason about a
>    function call due to the possibility of side effects, the
>    autovectorizer does not attempt to vectorize loops which make such
>    calls.
>
>    Given the memory reference passed to `__builtin_prefetch', in the
>    absence of assurances about its effect on the passed memory
>    location the compiler deems the function unsafe to vectorize,
>    marking it as clobbering memory in `vect_find_stmt_data_reference'.
>    This leads to the failure in autovectorization.
>
> 2. Notwithstanding the above issue, though the prefetch statement
>    would be classed as `vect_unused_in_scope', the loop invariant that
>    is used in the address of the prefetch is the scalar loop's and not
>    the vector loop's IV. That is, it still uses `i' and not `vec_iv'
>    because the instruction wasn't vectorized, causing DCE to think the
>    value is live, such that we now have both the vector and scalar loop
>    invariant actively used in the loop.
>
> This patch addresses both of these:
>
> 1. About the issue regarding the memory clobber, data prefetch does
>    not generate faults if its address argument is invalid and does not
>    write to memory.  Therefore, it does not alter the internal state
>    of the program or its control flow under any circumstance.  As
>    such, it is reasonable that the function be marked as not affecting
>    memory contents.
>
>    To achieve this, we add the necessary logic to
>    `get_references_in_stmt' to ensure that builtin functions are given
>    given the same treatment as internal functions.  If the gimple call
>    is to a builtin function and its function code is
>    `BUILT_IN_PREFETCH', we mark `clobbers_memory' as false.
>
> 2. Finding precedence in the way clobber statements are handled,
>    whereby the vectorizer drops these from both the scalar and
>    vectorized versions of a given loop, we choose to drop prefetch
>    hints in a similar fashion.  This seems appropriate given how
>    software prefetch hints are typically ignored by processors across
>    architectures, as they seldom lead to performance gain over their
>    hardware counterparts.

OK.

Thanks,
Richard.

>    PR tree-optimization/114061
>
> gcc/ChangeLog:
>
>         * tree-data-ref.cc (get_references_in_stmt): set
>         `clobbers_memory' to false for __builtin_prefetch.
>         * tree-vect-loop.cc (vect_transform_loop): Drop all
>         __builtin_prefetch calls from loops.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/vect/vect-prefetch-drop.c: New test.
>         * gcc.target/aarch64/vect-prefetch-drop.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c      | 12 ++++++++++++
>  .../gcc.target/aarch64/vect-prefetch-drop.c         | 13 +++++++++++++
>  gcc/tree-data-ref.cc                                |  2 ++
>  gcc/tree-vect-loop.cc                               |  6 ++++--
>  4 files changed, 31 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-prefetch-drop.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c 
> b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> new file mode 100644
> index 00000000000..7a8915eb716
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +
> +void foo(int * restrict a, int * restrict b, int n){
> +  int i;
> +  for(i=0; i<n; ++i){
> +    a[i] = a[i] + b[i];
> +    __builtin_prefetch(&(b[i+8]));
> +  }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-prefetch-drop.c 
> b/gcc/testsuite/gcc.target/aarch64/vect-prefetch-drop.c
> new file mode 100644
> index 00000000000..e654b99fde8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-prefetch-drop.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target { aarch64*-*-* } } } */
> +/* { dg-additional-options "-O3 -march=armv9.2-a+sve --std=c99" { target { 
> aarch64*-*-* } } } */
> +
> +void foo(double * restrict a, double * restrict b, int n){
> +  int i;
> +  for(i=0; i<n; ++i){
> +    a[i] = a[i] + b[i];
> +    __builtin_prefetch(&(b[i+8]));
> +  }
> +}
> +
> +/* { dg-final { scan-assembler-not "prfm" } } */
> +/* { dg-final { scan-assembler "fadd\tz\[0-9\]+.d, p\[0-9\]+/m, z\[0-9\]+.d, 
> z\[0-9\]+.d" } } */
> diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
> index 7b5f2d16238..bd61069b631 100644
> --- a/gcc/tree-data-ref.cc
> +++ b/gcc/tree-data-ref.cc
> @@ -5868,6 +5868,8 @@ get_references_in_stmt (gimple *stmt, vec<data_ref_loc, 
> va_heap> *references)
>             clobbers_memory = true;
>             break;
>           }
> +      else if (gimple_call_builtin_p (stmt, BUILT_IN_PREFETCH))
> +       clobbers_memory = false;
>        else
>         clobbers_memory = true;
>      }
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index c471f1564a7..89cc6e64589 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -12177,8 +12177,10 @@ vect_transform_loop (loop_vec_info loop_vinfo, 
> gimple *loop_vectorized_call)
>            !gsi_end_p (si);)
>         {
>           stmt = gsi_stmt (si);
> -         /* During vectorization remove existing clobber stmts.  */
> -         if (gimple_clobber_p (stmt))
> +         /* During vectorization remove existing clobber stmts and
> +            prefetches.  */
> +         if (gimple_clobber_p (stmt)
> +             || gimple_call_builtin_p (stmt, BUILT_IN_PREFETCH))
>             {
>               unlink_stmt_vdef (stmt);
>               gsi_remove (&si, true);
> --
> 2.34.1
>

Reply via email to