Re: [PATCH] vect: Use combined peeling and versioning for mutually aligned DRs

Richard Biener Sun, 08 Jun 2025 02:28:11 -0700

On Fri, 6 Jun 2025, Pengfei Li wrote:

> Current GCC uses either peeling or versioning, but not in combination,
> to handle unaligned data references (DRs) during vectorization. This
> limitation causes some loops with early break to fall back to scalar
> code at runtime.
> 
> Consider the following loop with DRs in its early break condition:
> 
>       for (int i = start; i < end; i++) {
>         if (a[i] == b[i])
>           break;
>         count++;
>       }
> 
> In the loop, references to a[] and b[] need to be strictly aligned for
> vectorization because speculative reads that may cross page boundaries
> are not allowed. Current GCC does versioning for this loop by creating a
> runtime check like:
> 
>       ((&a[start] | &b[start]) & mask) == 0
> 
> to see if two initial addresses both have lower bits zeros. If above
> runtime check fails, the loop will fall back to scalar code. However,
> it's often possible that DRs are all unaligned at the beginning but they
> become all aligned after a few loop iterations. We call this situation
> DRs being "mutually aligned".
> 
> This patch enables combined peeling and versioning to avoid loops with
> mutually aligned DRs falling back to scalar code. Specifically, the
> function vect_peeling_supportable is updated in this patch to return a
> three-state enum indicating how peeling can make all unsupportable DRs
> aligned. In addition to previous true/false return values, a new state
> peeling_maybe_supported is used to indicate that peeling may be able to
> make these DRs aligned but we are not sure about it at compile time. In
> this case, peeling should be combined with versioning so that a runtime
> check will be generated to guard the peeled vectorized loop.
> 
> A new type of runtime check is also introduced for combined peeling and
> versioning. It's enabled when LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT is true.
> The new check tests if all DRs recorded in LOOP_VINFO_MAY_MISALIGN_STMTS
> have the same lower address bits. For above loop case, the new test will
> generate an XOR between two addresses, like:
> 
>       ((&a[start] ^ &b[start]) & mask) == 0
> 
> Therefore, if a and b have the same alignment step (element size) and
> the same offset from an alignment boundary, a peeled vectorized loop
> will run. This new runtime check also works for >2 DRs, with the LHS
> expression being:
> 
>       ((a1 ^ a2) | (a2 ^ a3) | (a3 ^ a4) | ... | (an-1 ^ an)) & mask
> 
> where ai is the address of i'th DR.
> 
> This patch is bootstrapped and regression tested on x86_64-linux-gnu,
> arm-linux-gnueabihf and aarch64-linux-gnu.


Thanks for working on this.

The patch is OK.

Thanks,
Richard.


> gcc/ChangeLog:
> 
>       * tree-vect-data-refs.cc (vect_peeling_supportable): Return new
>       enum values to indicate if combined peeling and versioning can
>       potentially support vectorization.
>       (vect_enhance_data_refs_alignment): Support combined peeling and
>       versioning in vectorization analysis.
>       * tree-vect-loop-manip.cc (vect_create_cond_for_align_checks):
>       Add a new type of runtime check for mutually aligned DRs.
>       * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Set
>       default value of allow_mutual_alignment in the initializer list.
>       * tree-vectorizer.h (enum peeling_support): Define type of
>       peeling support for function vect_peeling_supportable.
>       (LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT): New access macro.
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.dg/vect/vect-early-break_133_pfa6.c: Adjust test.
> ---
>  .../gcc.dg/vect/vect-early-break_133_pfa6.c   |   2 +-
>  gcc/tree-vect-data-refs.cc                    | 168 ++++++++++++++----
>  gcc/tree-vect-loop-manip.cc                   |  96 +++++++---
>  gcc/tree-vect-loop.cc                         |   1 +
>  gcc/tree-vectorizer.h                         |  16 ++
>  5 files changed, 222 insertions(+), 61 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> index ee123df6ed2..7787d037d9d 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> @@ -20,4 +20,4 @@ unsigned test4(char x, char *vect_a, char *vect_b, int n)
>   return ret;
>  }
>  
> -/* { dg-final { scan-tree-dump "Versioning for alignment will be applied" 
> "vect" } } */
> +/* { dg-final { scan-tree-dump "Both peeling and versioning will be applied" 
> "vect" } } */
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index 1792ee4ea05..befdbff29f3 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -2111,9 +2111,10 @@ vect_peeling_hash_choose_best_peeling 
> (hash_table<peel_info_hasher> *peeling_hta
>     return res;
>  }
>  
> -/* Return true if the new peeling NPEEL is supported.  */
> +/* Return if vectorization is definitely, possibly, or unlikely to be
> +   supportable after loop peeling.  */
>  
> -static bool
> +static enum peeling_support
>  vect_peeling_supportable (loop_vec_info loop_vinfo, dr_vec_info *dr0_info,
>                         unsigned npeel)
>  {
> @@ -2123,8 +2124,11 @@ vect_peeling_supportable (loop_vec_info loop_vinfo, 
> dr_vec_info *dr0_info,
>    bool dr0_alignment_known_p
>      = known_alignment_for_access_p (dr0_info,
>                                   STMT_VINFO_VECTYPE (dr0_info->stmt));
> +  bool has_unsupported_dr_p = false;
> +  unsigned int dr0_step = DR_STEP_ALIGNMENT (dr0_info->dr);
> +  int known_unsupported_misalignment = DR_MISALIGNMENT_UNKNOWN;
>  
> -  /* Ensure that all data refs can be vectorized after the peel.  */
> +  /* Check if each data ref can be vectorized after peeling.  */
>    for (data_reference *dr : datarefs)
>      {
>        if (dr == dr0_info->dr)
> @@ -2152,10 +2156,44 @@ vect_peeling_supportable (loop_vec_info loop_vinfo, 
> dr_vec_info *dr0_info,
>       = vect_supportable_dr_alignment (loop_vinfo, dr_info, vectype,
>                                        misalignment);
>        if (supportable_dr_alignment == dr_unaligned_unsupported)
> -     return false;
> +     {
> +       has_unsupported_dr_p = true;
> +
> +       /* If unaligned unsupported DRs exist, we do following checks to see
> +          if they can be mutually aligned to support vectorization.  If yes,
> +          we can try peeling and create a runtime (mutual alignment) check
> +          to guard the peeled loop.  If no, return PEELING_UNSUPPORTED.  */
> +
> +       /* 1) If unaligned unsupported DRs have different alignment steps, the
> +             probability of DRs being mutually aligned is very low, and it's
> +             quite complex to check mutual alignment at runtime.  We return
> +             PEELING_UNSUPPORTED in this case.  */
> +       if (DR_STEP_ALIGNMENT (dr) != dr0_step)
> +         return peeling_unsupported;
> +
> +       /* 2) Based on above same alignment step condition, if one known
> +             misaligned DR has zero misalignment, or different misalignment
> +             amount from another known misaligned DR, peeling is unable to
> +             help make all these DRs aligned together.  We won't try peeling
> +             with versioning anymore.  */
> +       int curr_dr_misalignment = dr_misalignment (dr_info, vectype);
> +       if (curr_dr_misalignment == 0)
> +         return peeling_unsupported;
> +       if (known_unsupported_misalignment != DR_MISALIGNMENT_UNKNOWN)
> +         {
> +           if (curr_dr_misalignment != DR_MISALIGNMENT_UNKNOWN
> +               && curr_dr_misalignment != known_unsupported_misalignment)
> +             return peeling_unsupported;
> +         }
> +       else
> +         known_unsupported_misalignment = curr_dr_misalignment;
> +     }
>      }
>  
> -  return true;
> +  /* Vectorization is known to be supportable with peeling alone when there 
> is
> +     no unsupported DR.  */
> +  return has_unsupported_dr_p ? peeling_maybe_supported
> +                           : peeling_known_supported;
>  }
>  
>  /* Compare two data-references DRA and DRB to group them into chunks
> @@ -2264,20 +2302,20 @@ dr_align_group_sort_cmp (const void *dra_, const void 
> *drb_)
>       }
>  
>       -- Possibility 3: combination of loop peeling and versioning:
> -     for (i = 0; i < 3; i++){        # (scalar loop, not to be vectorized).
> -     x = q[i];
> -     p[i] = y;
> -     }
> -     if (p is aligned) {
> -     for (i = 3; i<N; i++){  # loop 3A
> +     if (p & q are mutually aligned) {
> +     for (i=0; i<3; i++){    # (peeled loop iterations).
> +       x = q[i];
> +       p[i] = y;
> +     }
> +     for (i=3; i<N; i++){    # loop 3A
>         x = q[i];                     # DR_MISALIGNMENT(q) = 0
>         p[i] = y;                     # DR_MISALIGNMENT(p) = 0
>       }
>       }
>       else {
> -     for (i = 3; i<N; i++){  # loop 3B
> -       x = q[i];                     # DR_MISALIGNMENT(q) = 0
> -       p[i] = y;                     # DR_MISALIGNMENT(p) = unaligned
> +     for (i=0; i<N; i++){    # (scalar loop, not to be vectorized).
> +       x = q[i];                     # DR_MISALIGNMENT(q) = 3
> +       p[i] = y;                     # DR_MISALIGNMENT(p) = unknown
>       }
>       }
>  
> @@ -2296,6 +2334,7 @@ vect_enhance_data_refs_alignment (loop_vec_info 
> loop_vinfo)
>    unsigned int i;
>    bool do_peeling = false;
>    bool do_versioning = false;
> +  bool try_peeling_with_versioning = false;
>    unsigned int npeel = 0;
>    bool one_misalignment_known = false;
>    bool one_misalignment_unknown = false;
> @@ -2361,30 +2400,38 @@ vect_enhance_data_refs_alignment (loop_vec_info 
> loop_vinfo)
>    /* While cost model enhancements are expected in the future, the high level
>       view of the code at this time is as follows:
>  
> -     A) If there is a misaligned access then see if peeling to align
> -        this access can make all data references satisfy
> -        vect_supportable_dr_alignment.  If so, update data structures
> -        as needed and return true.
> +     A) If there is a misaligned access then see if doing peeling alone can
> +     make all data references satisfy vect_supportable_dr_alignment.  If so,
> +     update data structures and return.
> +
> +     B) If peeling alone wasn't possible and there is a data reference with 
> an
> +     unknown misalignment that does not satisfy vect_supportable_dr_alignment
> +     then we may use either of the following two approaches.
>  
> -     B) If peeling wasn't possible and there is a data reference with an
> -        unknown misalignment that does not satisfy 
> vect_supportable_dr_alignment
> -        then see if loop versioning checks can be used to make all data
> -        references satisfy vect_supportable_dr_alignment.  If so, update
> -        data structures as needed and return true.
> +     B1) Try peeling with versioning: Add a runtime loop versioning check to
> +         see if all unsupportable data references are mutually aligned, which
> +         means they will be uniformly aligned after a certain amount of loop
> +         peeling.  If peeling and versioning can be used together, set
> +         LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT_P to TRUE and return.
>  
> -     C) If neither peeling nor versioning were successful then return false 
> if
> -        any data reference does not satisfy vect_supportable_dr_alignment.
> +     B2) Try versioning alone: Add a runtime loop versioning check to see if
> +         all unsupportable data references are already uniformly aligned
> +         without loop peeling.  If versioning can be applied alone, set
> +         LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT_P to FALSE and return.
>  
> -     D) Return true (all data references satisfy 
> vect_supportable_dr_alignment).
> +     Above B1 is more powerful and more likely to be adopted than B2.  But B2
> +     is still available and useful in some cases, for example, the cost model
> +     does not allow much peeling.
>  
> -     Note, Possibility 3 above (which is peeling and versioning together) is 
> not
> -     being done at this time.  */
> +     C) If none of above was successful then the alignment was not enhanced,
> +     just return.  */
>  
>    /* (1) Peeling to force alignment.  */
>  
> -  /* (1.1) Decide whether to perform peeling, and how many iterations to 
> peel:
> +  /* (1.1) Decide whether to perform peeling, how many iterations to peel, 
> and
> +     if vectorization may be supported by peeling with versioning.
>       Considerations:
> -     + How many accesses will become aligned due to the peeling
> +     - How many accesses will become aligned due to the peeling
>       - How many accesses will become unaligned due to the peeling,
>         and the cost of misaligned accesses.
>       - The cost of peeling (the extra runtime checks, the increase
> @@ -2732,9 +2779,27 @@ vect_enhance_data_refs_alignment (loop_vec_info 
> loop_vinfo)
>                               "Try peeling by %d\n", npeel);
>          }
>  
> -      /* Ensure that all datarefs can be vectorized after the peel.  */
> -      if (!vect_peeling_supportable (loop_vinfo, dr0_info, npeel))
> -     do_peeling = false;
> +      /* Check how peeling for alignment can support vectorization.  Function
> +      vect_peeling_supportable returns one of the three possible values:
> +      - PEELING_KNOWN_SUPPORTED: indicates that we know all unsupported
> +        datarefs can be aligned after peeling.  We can use peeling alone.
> +      - PEELING_MAYBE_SUPPORTED: indicates that peeling may be able to make
> +        these datarefs aligned but we are not sure about it at compile time.
> +        We will try peeling with versioning to add a runtime check to guard
> +        the peeled loop.
> +      - PEELING_UNSUPPORTED: indicates that peeling is almost impossible to
> +        support vectorization.  We will stop trying peeling.  */
> +      switch (vect_peeling_supportable (loop_vinfo, dr0_info, npeel))
> +     {
> +     case peeling_known_supported:
> +       break;
> +     case peeling_maybe_supported:
> +       try_peeling_with_versioning = true;
> +       break;
> +     case peeling_unsupported:
> +       do_peeling = false;
> +       break;
> +     }
>  
>        /* Check if all datarefs are supportable and log.  */
>        if (do_peeling
> @@ -2811,7 +2876,11 @@ vect_enhance_data_refs_alignment (loop_vec_info 
> loop_vinfo)
>  
>               vect_update_misalignment_for_peel (dr_info, dr0_info, npeel);
>             }
> +     }
>  
> +      if (do_peeling && !try_peeling_with_versioning)
> +     {
> +       /* Update data structures if peeling will be applied alone.  */
>            LOOP_VINFO_UNALIGNED_DR (loop_vinfo) = dr0_info;
>            if (npeel)
>              LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) = npeel;
> @@ -2939,6 +3008,11 @@ vect_enhance_data_refs_alignment (loop_vec_info 
> loop_vinfo)
>          LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo).truncate (0);
>      }
>  
> +  /* If we are trying peeling with versioning but versioning is disabled for
> +     some reason, peeling should be turned off together.  */
> +  if (try_peeling_with_versioning && !do_versioning)
> +    do_peeling = false;
> +
>    if (do_versioning)
>      {
>        const vec<stmt_vec_info> &may_misalign_stmts
> @@ -2958,12 +3032,28 @@ vect_enhance_data_refs_alignment (loop_vec_info 
> loop_vinfo)
>                               "Alignment of access forced using 
> versioning.\n");
>          }
>  
> -      if (dump_enabled_p ())
> -        dump_printf_loc (MSG_NOTE, vect_location,
> -                         "Versioning for alignment will be applied.\n");
> -
> -      /* Peeling and versioning can't be done together at this time.  */
> -      gcc_assert (! (do_peeling && do_versioning));
> +      if (do_peeling)
> +     {
> +       /* This point is reached if peeling and versioning are used together
> +          to ensure alignment.  Update data structures to make sure the loop
> +          is correctly peeled and a right runtime check is added for loop
> +          versioning.  */
> +       gcc_assert (try_peeling_with_versioning);
> +       LOOP_VINFO_UNALIGNED_DR (loop_vinfo) = dr0_info;
> +       LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) = -1;
> +       LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT (loop_vinfo) = true;
> +       if (dump_enabled_p ())
> +         dump_printf_loc (MSG_NOTE, vect_location,
> +                          "Both peeling and versioning will be applied.\n");
> +     }
> +      else
> +     {
> +       /* This point is reached if versioning is used alone.  */
> +       LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT (loop_vinfo) = false;
> +       if (dump_enabled_p ())
> +         dump_printf_loc (MSG_NOTE, vect_location,
> +                          "Versioning for alignment will be applied.\n");
> +     }
>  
>        return opt_result::success ();
>      }
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 56a4e9a8b63..18f7c1bcea6 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -3787,10 +3787,11 @@ chain_cond_expr (tree *cond_expr, tree part_cond_expr)
>  
>     Input:
>     COND_EXPR  - input conditional expression.  New conditions will be chained
> -                with logical AND operation.
> -   LOOP_VINFO - two fields of the loop information are used.
> -                LOOP_VINFO_PTR_MASK is the mask used to check the alignment.
> -                LOOP_VINFO_MAY_MISALIGN_STMTS contains the refs to be 
> checked.
> +             with logical AND operation.
> +   LOOP_VINFO - three fields of the loop information are used.
> +             LOOP_VINFO_PTR_MASK is the mask used to check the alignment.
> +             LOOP_VINFO_MAY_MISALIGN_STMTS contains the refs to be checked.
> +             LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT indicates which check applies.
>  
>     Output:
>     COND_EXPR_STMT_LIST - statements needed to construct the conditional
> @@ -3798,7 +3799,20 @@ chain_cond_expr (tree *cond_expr, tree part_cond_expr)
>     The returned value is the conditional expression to be used in the if
>     statement that controls which version of the loop gets executed at 
> runtime.
>  
> -   The algorithm makes two assumptions:
> +   Based on the boolean value of LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT, we decide
> +   which type of check should be applied and create two different expressions
> +   accordingly.
> +     1) When LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT is false, we see if all data 
> refs
> +     to be checked are already aligned to an alignment boundary.  We create
> +     an expression of "(a_1 | a_2 | a_3 | ... | a_n) & mask", where "a_i" is
> +     the address of i'th data reference.
> +     2) When LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT is true, we see if all data 
> refs
> +     can be aligned to a boundary after a certain amount of peeling, in other
> +     words, their addresses have the same bottom bits according to the mask.
> +     We create "((a_1 ^ a_2) | (a_2 ^ a_3) | ... | (a_n-1 ^ a_n)) & mask",
> +     where "a_i" is the address of i'th data reference.
> +
> +   Both algorithms make two assumptions:
>       1) The number of bytes "n" in a vector is a power of 2.
>       2) An address "a" is aligned if a%n is zero and that this
>          test can be done as a&(n-1) == 0.  For example, for 16
> @@ -3818,6 +3832,7 @@ vect_create_cond_for_align_checks (loop_vec_info 
> loop_vinfo,
>    tree int_ptrsize_type;
>    char tmp_name[20];
>    tree or_tmp_name = NULL_TREE;
> +  tree prev_addr_tmp_name = NULL_TREE;
>    tree and_tmp_name;
>    gimple *and_stmt;
>    tree ptrsize_zero;
> @@ -3829,16 +3844,19 @@ vect_create_cond_for_align_checks (loop_vec_info 
> loop_vinfo,
>  
>    int_ptrsize_type = signed_type_for (ptr_type_node);
>  
> -  /* Create expression (mask & (dr_1 || ... || dr_n)) where dr_i is the 
> address
> -     of the first vector of the i'th data reference. */
> +  /* If LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT is true, we should have at least 
> two
> +     datarefs to check the mutual alignment.  */
> +  gcc_assert (may_misalign_stmts.length () > 1
> +           || !LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT (loop_vinfo));
>  
>    FOR_EACH_VEC_ELT (may_misalign_stmts, i, stmt_info)
>      {
>        gimple_seq new_stmt_list = NULL;
>        tree addr_base;
>        tree addr_tmp_name;
> +      tree xor_tmp_name;
>        tree new_or_tmp_name;
> -      gimple *addr_stmt, *or_stmt;
> +      gimple *addr_stmt, *or_stmt, *xor_stmt;
>        tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>        bool negative = tree_int_cst_compare
>       (DR_STEP (STMT_VINFO_DATA_REF (stmt_info)), size_zero_node) < 0;
> @@ -3860,20 +3878,56 @@ vect_create_cond_for_align_checks (loop_vec_info 
> loop_vinfo,
>        addr_stmt = gimple_build_assign (addr_tmp_name, NOP_EXPR, addr_base);
>        gimple_seq_add_stmt (cond_expr_stmt_list, addr_stmt);
>  
> -      /* The addresses are OR together.  */
> -
> -      if (or_tmp_name != NULL_TREE)
> -        {
> -          /* create: or_tmp = or_tmp | addr_tmp */
> -          sprintf (tmp_name, "orptrs%d", i);
> -       new_or_tmp_name = make_temp_ssa_name (int_ptrsize_type, NULL, 
> tmp_name);
> -       or_stmt = gimple_build_assign (new_or_tmp_name, BIT_IOR_EXPR,
> -                                      or_tmp_name, addr_tmp_name);
> -       gimple_seq_add_stmt (cond_expr_stmt_list, or_stmt);
> -          or_tmp_name = new_or_tmp_name;
> -        }
> +      if (LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT (loop_vinfo))
> +     {
> +       /* Create "((a_1 ^ a_2) | (a_2 ^ a_3) | ... | (a_n-1 ^ a_n)) & mask"
> +          to check mutual alignment.  */
> +       if (prev_addr_tmp_name != NULL_TREE)
> +         {
> +           sprintf (tmp_name, "xorptrs%d_%d", i - 1, i);
> +           xor_tmp_name = make_temp_ssa_name (int_ptrsize_type, NULL,
> +                                              tmp_name);
> +           xor_stmt = gimple_build_assign (xor_tmp_name, BIT_XOR_EXPR,
> +                                           prev_addr_tmp_name,
> +                                           addr_tmp_name);
> +           gimple_seq_add_stmt (cond_expr_stmt_list, xor_stmt);
> +           if (or_tmp_name == NULL_TREE)
> +             {
> +               /* Create the 1st XOR when the 2nd data ref is seen.  */
> +               or_tmp_name = xor_tmp_name;
> +             }
> +           else
> +             {
> +               /* Create: or_tmp = or_tmp | new_xor_tmp.  */
> +               sprintf (tmp_name, "orxors%d", i - 1);
> +               new_or_tmp_name = make_temp_ssa_name (int_ptrsize_type, NULL,
> +                                                     tmp_name);
> +               or_stmt = gimple_build_assign (new_or_tmp_name, BIT_IOR_EXPR,
> +                                              or_tmp_name, xor_tmp_name);
> +               gimple_seq_add_stmt (cond_expr_stmt_list, or_stmt);
> +               or_tmp_name = new_or_tmp_name;
> +             }
> +         }
> +       prev_addr_tmp_name = addr_tmp_name;
> +     }
>        else
> -        or_tmp_name = addr_tmp_name;
> +     {
> +       /* Create: "(a_1 | a_2 | a_3 | ... | a_n) & mask" to check if all
> +          addresses are already aligned.  */
> +       if (or_tmp_name != NULL_TREE)
> +         {
> +           /* Create: or_tmp = or_tmp | addr_tmp.  */
> +           sprintf (tmp_name, "orptrs%d", i);
> +           new_or_tmp_name = make_temp_ssa_name (int_ptrsize_type, NULL,
> +                                                 tmp_name);
> +           or_stmt = gimple_build_assign (new_or_tmp_name, BIT_IOR_EXPR,
> +                                          or_tmp_name, addr_tmp_name);
> +           gimple_seq_add_stmt (cond_expr_stmt_list, or_stmt);
> +           or_tmp_name = new_or_tmp_name;
> +         }
> +       else
> +         or_tmp_name = addr_tmp_name;
> +     }
>  
>      } /* end for i */
>  
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 8c5761d3c55..76c17e75f2e 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1069,6 +1069,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
> vec_info_shared *shared)
>      using_decrementing_iv_p (false),
>      using_select_vl_p (false),
>      epil_using_partial_vectors_p (false),
> +    allow_mutual_alignment (false),
>      partial_load_store_bias (0),
>      peeling_for_gaps (false),
>      peeling_for_niter (false),
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 7aa2b02b63c..6ccafaf18e6 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -56,6 +56,14 @@ enum dr_alignment_support {
>    dr_aligned
>  };
>  
> +/* Define type of peeling support to indicate how peeling for alignment can 
> help
> +   make vectorization supported.  */
> +enum peeling_support {
> +  peeling_known_supported,
> +  peeling_maybe_supported,
> +  peeling_unsupported
> +};
> +
>  /* Define type of def-use cross-iteration cycle.  */
>  enum vect_def_type {
>    vect_uninitialized_def = 0,
> @@ -946,6 +954,13 @@ public:
>       epilogue of loop.  */
>    bool epil_using_partial_vectors_p;
>  
> +  /* True if we've decided to use peeling with versioning together, which 
> allows
> +     unaligned unsupported data refs to be uniformly aligned after a certain
> +     amount of peeling (mutual alignment).  Otherwise, we use versioning 
> alone
> +     so these data refs must be already aligned to a power-of-two boundary
> +     without peeling.  */
> +  bool allow_mutual_alignment;
> +
>    /* The bias for len_load and len_store.  For now, only 0 and -1 are
>       supported.  -1 must be used when a backend does not support
>       len_load/len_store with a length of zero.  */
> @@ -1070,6 +1085,7 @@ public:
>  #define LOOP_VINFO_USING_SELECT_VL_P(L) (L)->using_select_vl_p
>  #define LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P(L)                           
>   \
>    (L)->epil_using_partial_vectors_p
> +#define LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT(L) (L)->allow_mutual_alignment
>  #define LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS(L) (L)->partial_load_store_bias
>  #define LOOP_VINFO_VECT_FACTOR(L)          (L)->vectorization_factor
>  #define LOOP_VINFO_MAX_VECT_FACTOR(L)      (L)->max_vectorization_factor
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Re: [PATCH] vect: Use combined peeling and versioning for mutually aligned DRs

Reply via email to