On 5/16/24 15:16, Andrew Pinski wrote:


On Thu, May 16, 2024, 3:58 PM Victor Do Nascimento <victor.donascime...@arm.com <mailto:victor.donascime...@arm.com>> wrote:

    At present the autovectorizer fails to vectorize simple loops
    involving calls to `__builtin_prefetch'.  A simple example of such
    loop is given below:

    void foo(double * restrict a, double * restrict b, int n){
       int i;
       for(i=0; i<n; ++i){
         a[i] = a[i] + b[i];
         __builtin_prefetch(&(b[i+8]));
       }
    }

    The failure stems from two issues:

    1. Given that it is typically not possible to fully reason about a
        function call due to the possibility of side effects, the
        autovectorizer does not attempt to vectorize loops which make such
        calls.

        Given the memory reference passed to `__builtin_prefetch', in the
        absence of assurances about its effect on the passed memory
        location the compiler deems the function unsafe to vectorize,
        marking it as clobbering memory in `vect_find_stmt_data_reference'.
        This leads to the failure in autovectorization.

    2. Notwithstanding the above issue, though the prefetch statement
        would be classed as `vect_unused_in_scope', the loop invariant that
        is used in the address of the prefetch is the scalar loop's and not
        the vector loop's IV. That is, it still uses `i' and not `vec_iv'
        because the instruction wasn't vectorized, causing DCE to think the
        value is live, such that we now have both the vector and scalar loop
        invariant actively used in the loop.

    This patch addresses both of these:

    1. About the issue regarding the memory clobber, data prefetch does
        not generate faults if its address argument is invalid and does not
        write to memory.  Therefore, it does not alter the internal state
        of the program or its control flow under any circumstance.  As
        such, it is reasonable that the function be marked as not affecting
        memory contents.

        To achieve this, we add the necessary logic to
        `get_references_in_stmt' to ensure that builtin functions are given
        given the same treatment as internal functions.  If the gimple call
        is to a builtin function and its function code is
        `BUILT_IN_PREFETCH', we mark `clobbers_memory' as false.

    2. Finding precedence in the way clobber statements are handled,
        whereby the vectorizer drops these from both the scalar and
        vectorized versions of a given loop, we choose to drop prefetch
        hints in a similar fashion.  This seems appropriate given how
        software prefetch hints are typically ignored by processors across
        architectures, as they seldom lead to performance gain over their
        hardware counterparts.

        PR target/114061


This most likely be tree-optimization/114061 since it is a generic vectorizer issue. Oh maybe reference the bug # in summary next time just for easier reference.

Thanks,
Andrew

My bad.

You're right, it's tree-optimization/114061.  Thanks for catching this.

Cheers,
Victor


    gcc/ChangeLog:

             * tree-data-ref.cc (get_references_in_stmt): set
             `clobbers_memory' to false for __builtin_prefetch.
             * tree-vect-loop.cc (vect_transform_loop): Drop all
             __builtin_prefetch calls from loops.

    gcc/testsuite/ChangeLog:

             * gcc.dg/vect/vect-prefetch-drop.c: New test.
    ---
      gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c | 14 ++++++++++++++
      gcc/tree-data-ref.cc                           |  9 +++++++++
      gcc/tree-vect-loop.cc                          |  7 ++++++-
      3 files changed, 29 insertions(+), 1 deletion(-)
      create mode 100644 gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c

    diff --git a/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
    b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
    new file mode 100644
    index 00000000000..57723a8c972
    --- /dev/null
    +++ b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
    @@ -0,0 +1,14 @@
    +/* { dg-do compile { target { aarch64*-*-* } } } */
    +/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sve
    -fdump-tree-vect-details" { target { aarch64*-*-* } } } */
    +
    +void foo(double * restrict a, double * restrict b, int n){
    +  int i;
    +  for(i=0; i<n; ++i){
    +    a[i] = a[i] + b[i];
    +    __builtin_prefetch(&(b[i+8]));
    +  }
    +}
    +
    +/* { dg-final { scan-assembler-not "prfm" } } */
    +/* { dg-final { scan-assembler "fadd\tz\[0-9\]+.d, p\[0-9\]+/m,
    z\[0-9\]+.d, z\[0-9\]+.d" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
    diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
    index f37734b5340..47bfec0f922 100644
    --- a/gcc/tree-data-ref.cc
    +++ b/gcc/tree-data-ref.cc
    @@ -5843,6 +5843,15 @@ get_references_in_stmt (gimple *stmt,
    vec<data_ref_loc, va_heap> *references)
                 clobbers_memory = true;
                 break;
               }
    +
    +      else if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
    +       {
    +         enum built_in_function fn_type = DECL_FUNCTION_CODE
    (TREE_OPERAND (gimple_call_fn (stmt), 0));
    +         if (fn_type == BUILT_IN_PREFETCH)
    +           clobbers_memory = false;
    +         else
    +           clobbers_memory = true;
    +       }
            else
             clobbers_memory = true;
          }
    diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
    index 361aec06488..65e8b421d80 100644
    --- a/gcc/tree-vect-loop.cc
    +++ b/gcc/tree-vect-loop.cc
    @@ -12069,13 +12069,18 @@ vect_transform_loop (loop_vec_info
    loop_vinfo, gimple *loop_vectorized_call)
                !gsi_end_p (si);)
             {
               stmt = gsi_stmt (si);
    -         /* During vectorization remove existing clobber stmts.  */
    +         /* During vectorization remove existing clobber stmts and
    +            prefetches.  */
               if (gimple_clobber_p (stmt))
                 {
                   unlink_stmt_vdef (stmt);
                   gsi_remove (&si, true);
                   release_defs (stmt);
                 }
    +         else if (gimple_call_builtin_p (stmt) &&
    +                  DECL_FUNCTION_CODE (TREE_OPERAND (gimple_call_fn
    (stmt),
    +                                                    0)) ==
    BUILT_IN_PREFETCH)
    +               gsi_remove (&si, true);
               else
                 {
                   /* Ignore vector stmts created in the outer loop.  */
-- 2.34.1

Reply via email to