On Fri, Nov 17, 2017 at 11:17 PM, Richard Sandiford
<richard.sandif...@linaro.org> wrote:
> This patch adds runtime alias checks for loops with variable strides,
> so that we can vectorise them even without a restrict qualifier.
> There are several parts to doing this:
>
> 1) For accesses like:
>
>      x[i * n] += 1;
>
>    we need to check whether n (and thus the DR_STEP) is nonzero.
>    vect_analyze_data_ref_dependence records values that need to be
>    checked in this way, then prune_runtime_alias_test_list records a
>    bounds check on DR_STEP being outside the range [0, 0].
>
> 2) For accesses like:
>
>      x[i * n] = x[i * n + 1] + 1;
>
>    we simply need to test whether abs (n) >= 2.
>    prune_runtime_alias_test_list looks for cases like this and tries
>    to guess whether it is better to use this kind of check or a check
>    for non-overlapping ranges.  (We could do an OR of the two conditions
>    at runtime, but that isn't implemented yet.)
>
> 3) Checks for overlapping ranges need to cope with variable strides.
>    At present the "length" of each segment in a range check is
>    represented as an offset from the base that lies outside the
>    touched range, in the same direction as DR_STEP.  The length
>    can therefore be negative and is sometimes conservative.
>
>    With variable steps it's easier to reaon about if we split
>    this into two:
>
>      seg_len:
>        distance travelled from the first iteration of interest
>        to the last, e.g. DR_STEP * (VF - 1)
>
>      access_size:
>        the number of bytes accessed in each iteration
>
>    with access_size always being a positive constant and seg_len
>    possibly being variable.  We can then combine alias checks
>    for two accesses that are a constant number of bytes apart by
>    adjusting the access size to account for the gap.  This leaves
>    the segment length unchanged, which allows the check to be combined
>    with further accesses.
>
>    When seg_len is positive, the runtime alias check has the form:
>
>         base_a >= base_b + seg_len_b + access_size_b
>      || base_b >= base_a + seg_len_a + access_size_a
>
>    In many accesses the base will be aligned to the access size, which
>    allows us to skip the addition:
>
>         base_a > base_b + seg_len_b
>      || base_b > base_a + seg_len_a
>
>    A similar saving is possible with "negative" lengths.
>
>    The patch therefore tracks the alignment in addition to seg_len
>    and access_size.
>
> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
> and powerpc64le-linux-gnu.  OK to install?

Ok.

Thanks,
Richard.

> Richard
>
> PS. This is the last SVE patch for GCC 8!
>
>
> 2017-11-17  Richard Sandiford  <richard.sandif...@linaro.org>
>             Alan Hayward  <alan.hayw...@arm.com>
>             David Sherwood  <david.sherw...@arm.com>
>
> gcc/
>         * tree-vectorizer.h (vec_lower_bound): New structure.
>         (_loop_vec_info): Add check_nonzero and lower_bounds.
>         (LOOP_VINFO_CHECK_NONZERO): New macro.
>         (LOOP_VINFO_LOWER_BOUNDS): Likewise.
>         (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Check lower_bounds too.
>         * tree-data-ref.h (dr_with_seg_len): Add access_size and align
>         fields.  Make seg_len the distance travelled, not including the
>         access size.
>         (dr_direction_indicator): Declare.
>         (dr_zero_step_indicator): Likewise.
>         (dr_known_forward_stride_p): Likewise.
>         * tree-data-ref.c: Include stringpool.h, tree-vrp.h and
>         tree-ssanames.h.
>         (runtime_alias_check_p): Allow runtime alias checks with
>         variable strides.
>         (operator ==): Compare access_size and align.
>         (prune_runtime_alias_test_list): Rework for new distinction between
>         the access_size and seg_len.
>         (create_intersect_range_checks_index): Likewise.  Cope with polynomial
>         segment lengths.
>         (get_segment_min_max): New function.
>         (create_intersect_range_checks): Use it.
>         (dr_step_indicator): New function.
>         (dr_direction_indicator): Likewise.
>         (dr_zero_step_indicator): Likewise.
>         (dr_known_forward_stride_p): Likewise.
>         * tree-loop-distribution.c (data_ref_segment_size): Return
>         DR_STEP * (niters - 1).
>         (compute_alias_check_pairs): Update call to the dr_with_seg_len
>         constructor.
>         * tree-vect-data-refs.c (vect_check_nonzero_value): New function.
>         (vect_preserves_scalar_order_p): New function, split out from...
>         (vect_analyze_data_ref_dependence): ...here.  Check for zero steps.
>         (vect_vfa_segment_size): Return DR_STEP * (length_factor - 1).
>         (vect_vfa_access_size): New function.
>         (vect_vfa_align): Likewise.
>         (vect_compile_time_alias): Take access_size_a and access_b arguments.
>         (dump_lower_bound): New function.
>         (vect_check_lower_bound): Likewise.
>         (vect_small_gap_p): Likewise.
>         (vectorizable_with_step_bound_p): Likewise.
>         (vect_prune_runtime_alias_test_list): Ignore cross-iteration
>         depencies if the vectorization factor is 1.  Convert the checks
>         for nonzero steps into checks on the bounds of DR_STEP.  Try using
>         a bunds check for variable steps if the minimum required step is
>         relatively small. Update calls to the dr_with_seg_len
>         constructor and to vect_compile_time_alias.
>         * tree-vect-loop-manip.c (vect_create_cond_for_lower_bounds): New
>         function.
>         (vect_loop_versioning): Call it.
>         * tree-vect-loop.c (vect_analyze_loop_2): Clear 
> LOOP_VINFO_LOWER_BOUNDS
>         when retrying.
>         (vect_estimate_min_profitable_iters): Account for any bounds checks.
>
> gcc/testsuite/
>         * gcc.dg/vect/bb-slp-cond-1.c: Expect loop vectorization rather
>         than SLP vectorization.
>         * gcc.dg/vect/vect-alias-check-10.c: New test.
>         * gcc.dg/vect/vect-alias-check-11.c: Likewise.
>         * gcc.dg/vect/vect-alias-check-12.c: Likewise.
>         * gcc.dg/vect/vect-alias-check-8.c: Likewise.
>         * gcc.dg/vect/vect-alias-check-9.c: Likewise.
>         * gcc.target/aarch64/sve_strided_load_8.c: Likewise.
>         * gcc.target/aarch64/sve_var_stride_1.c: Likewise.
>         * gcc.target/aarch64/sve_var_stride_1.h: Likewise.
>         * gcc.target/aarch64/sve_var_stride_1_run.c: Likewise.
>         * gcc.target/aarch64/sve_var_stride_2.c: Likewise.
>         * gcc.target/aarch64/sve_var_stride_2_run.c: Likewise.
>         * gcc.target/aarch64/sve_var_stride_3.c: Likewise.
>         * gcc.target/aarch64/sve_var_stride_3_run.c: Likewise.
>         * gcc.target/aarch64/sve_var_stride_4.c: Likewise.
>         * gcc.target/aarch64/sve_var_stride_4_run.c: Likewise.
>         * gcc.target/aarch64/sve_var_stride_5.c: Likewise.
>         * gcc.target/aarch64/sve_var_stride_5_run.c: Likewise.
>         * gcc.target/aarch64/sve_var_stride_6.c: Likewise.
>         * gcc.target/aarch64/sve_var_stride_6_run.c: Likewise.
>         * gcc.target/aarch64/sve_var_stride_7.c: Likewise.
>         * gcc.target/aarch64/sve_var_stride_7_run.c: Likewise.
>         * gcc.target/aarch64/sve_var_stride_8.c: Likewise.
>         * gcc.target/aarch64/sve_var_stride_8_run.c: Likewise.
>         * gfortran.dg/vect/vect-alias-check-1.F90: Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===================================================================
> *** gcc/tree-vectorizer.h       2017-11-17 22:07:58.227016438 +0000
> --- gcc/tree-vectorizer.h       2017-11-17 22:11:53.273439843 +0000
> *************** #define SLP_TREE_DEF_TYPE(S)                     (S)->def
> *** 174,179 ****
> --- 174,191 ----
>      loop to be valid.  */
>   typedef std::pair<tree, tree> vec_object_pair;
>
> + /* Records that vectorization is only possible if abs (EXPR) >= MIN_VALUE.
> +    UNSIGNED_P is true if we can assume that abs (EXPR) == EXPR.  */
> + struct vec_lower_bound {
> +   vec_lower_bound () {}
> +   vec_lower_bound (tree e, bool u, poly_uint64 m)
> +     : expr (e), unsigned_p (u), min_value (m) {}
> +
> +   tree expr;
> +   bool unsigned_p;
> +   poly_uint64 min_value;
> + };
> +
>   /* Vectorizer state common between loop and basic-block vectorization.  */
>   struct vec_info {
>     enum vec_kind { bb, loop };
> *************** typedef struct _loop_vec_info : public v
> *** 406,411 ****
> --- 418,431 ----
>     /* Check that the addresses of each pair of objects is unequal.  */
>     auto_vec<vec_object_pair> check_unequal_addrs;
>
> +   /* List of values that are required to be nonzero.  This is used to check
> +      whether things like "x[i * n] += 1;" are safe and eventually gets added
> +      to the checks for lower bounds below.  */
> +   auto_vec<tree> check_nonzero;
> +
> +   /* List of values that need to be checked for a minimum value.  */
> +   auto_vec<vec_lower_bound> lower_bounds;
> +
>     /* Statements in the loop that have data references that are candidates 
> for a
>        runtime (loop versioning) misalignment check.  */
>     auto_vec<gimple *> may_misalign_stmts;
> *************** #define LOOP_VINFO_MAY_MISALIGN_STMTS(L)
> *** 514,519 ****
> --- 534,541 ----
>   #define LOOP_VINFO_MAY_ALIAS_DDRS(L)       (L)->may_alias_ddrs
>   #define LOOP_VINFO_COMP_ALIAS_DDRS(L)      (L)->comp_alias_ddrs
>   #define LOOP_VINFO_CHECK_UNEQUAL_ADDRS(L)  (L)->check_unequal_addrs
> + #define LOOP_VINFO_CHECK_NONZERO(L)        (L)->check_nonzero
> + #define LOOP_VINFO_LOWER_BOUNDS(L)         (L)->lower_bounds
>   #define LOOP_VINFO_GROUPED_STORES(L)       (L)->grouped_stores
>   #define LOOP_VINFO_SLP_INSTANCES(L)        (L)->slp_instances
>   #define LOOP_VINFO_SLP_UNROLLING_FACTOR(L) (L)->slp_unrolling_factor
> *************** #define LOOP_REQUIRES_VERSIONING_FOR_ALI
> *** 534,540 ****
>     ((L)->may_misalign_stmts.length () > 0)
>   #define LOOP_REQUIRES_VERSIONING_FOR_ALIAS(L)         \
>     ((L)->comp_alias_ddrs.length () > 0 \
> !    || (L)->check_unequal_addrs.length () > 0)
>   #define LOOP_REQUIRES_VERSIONING_FOR_NITERS(L)                \
>     (LOOP_VINFO_NITERS_ASSUMPTIONS (L))
>   #define LOOP_REQUIRES_VERSIONING(L)                   \
> --- 556,563 ----
>     ((L)->may_misalign_stmts.length () > 0)
>   #define LOOP_REQUIRES_VERSIONING_FOR_ALIAS(L)         \
>     ((L)->comp_alias_ddrs.length () > 0 \
> !    || (L)->check_unequal_addrs.length () > 0 \
> !    || (L)->lower_bounds.length () > 0)
>   #define LOOP_REQUIRES_VERSIONING_FOR_NITERS(L)                \
>     (LOOP_VINFO_NITERS_ASSUMPTIONS (L))
>   #define LOOP_REQUIRES_VERSIONING(L)                   \
> Index: gcc/tree-data-ref.h
> ===================================================================
> *** gcc/tree-data-ref.h 2017-11-17 21:44:42.168891072 +0000
> --- gcc/tree-data-ref.h 2017-11-17 22:11:53.270650578 +0000
> *************** typedef struct data_reference *data_refe
> *** 203,213 ****
>
>   struct dr_with_seg_len
>   {
> !   dr_with_seg_len (data_reference_p d, tree len)
> !     : dr (d), seg_len (len) {}
>
>     data_reference_p dr;
>     tree seg_len;
>   };
>
>   /* This struct contains two dr_with_seg_len objects with aliasing data
> --- 203,222 ----
>
>   struct dr_with_seg_len
>   {
> !   dr_with_seg_len (data_reference_p d, tree len, unsigned HOST_WIDE_INT 
> size,
> !                  unsigned int a)
> !     : dr (d), seg_len (len), access_size (size), align (a) {}
>
>     data_reference_p dr;
> +   /* The offset of the last access that needs to be checked minus
> +      the offset of the first.  */
>     tree seg_len;
> +   /* A value that, when added to abs (SEG_LEN), gives the total number of
> +      bytes in the segment.  */
> +   poly_uint64 access_size;
> +   /* The minimum common alignment of DR's start address, SEG_LEN and
> +      ACCESS_SIZE.  */
> +   unsigned int align;
>   };
>
>   /* This struct contains two dr_with_seg_len objects with aliasing data
> *************** extern void prune_runtime_alias_test_lis
> *** 475,480 ****
> --- 484,493 ----
>                                            poly_uint64);
>   extern void create_runtime_alias_checks (struct loop *,
>                                          vec<dr_with_seg_len_pair_t> *, 
> tree*);
> + extern tree dr_direction_indicator (struct data_reference *);
> + extern tree dr_zero_step_indicator (struct data_reference *);
> + extern bool dr_known_forward_stride_p (struct data_reference *);
> +
>   /* Return true when the base objects of data references A and B are
>      the same memory object.  */
>
> Index: gcc/tree-data-ref.c
> ===================================================================
> *** gcc/tree-data-ref.c 2017-11-17 21:48:20.720851310 +0000
> --- gcc/tree-data-ref.c 2017-11-17 22:11:53.270650578 +0000
> *************** Software Foundation; either version 3, o
> *** 95,100 ****
> --- 95,103 ----
>   #include "tree-affine.h"
>   #include "params.h"
>   #include "builtins.h"
> + #include "stringpool.h"
> + #include "tree-vrp.h"
> + #include "tree-ssanames.h"
>
>   static struct datadep_stats
>   {
> *************** runtime_alias_check_p (ddr_p ddr, struct
> *** 1307,1324 ****
>         return false;
>       }
>
> -   /* FORNOW: We don't support creating runtime alias tests for non-constant
> -      step.  */
> -   if (TREE_CODE (DR_STEP (DDR_A (ddr))) != INTEGER_CST
> -       || TREE_CODE (DR_STEP (DDR_B (ddr))) != INTEGER_CST)
> -     {
> -       if (dump_enabled_p ())
> -       dump_printf (MSG_MISSED_OPTIMIZATION,
> -                      "runtime alias check not supported for non-constant "
> -                    "step\n");
> -       return false;
> -     }
> -
>     return true;
>   }
>
> --- 1310,1315 ----
> *************** runtime_alias_check_p (ddr_p ddr, struct
> *** 1333,1343 ****
>   operator == (const dr_with_seg_len& d1,
>              const dr_with_seg_len& d2)
>   {
> !   return operand_equal_p (DR_BASE_ADDRESS (d1.dr),
> !                         DR_BASE_ADDRESS (d2.dr), 0)
> !          && data_ref_compare_tree (DR_OFFSET (d1.dr), DR_OFFSET (d2.dr)) == > 0
> !          && data_ref_compare_tree (DR_INIT (d1.dr), DR_INIT (d2.dr)) == 0
> !          && data_ref_compare_tree (d1.seg_len, d2.seg_len) == 0;
>   }
>
>   /* Comparison function for sorting objects of dr_with_seg_len_pair_t
> --- 1324,1336 ----
>   operator == (const dr_with_seg_len& d1,
>              const dr_with_seg_len& d2)
>   {
> !   return (operand_equal_p (DR_BASE_ADDRESS (d1.dr),
> !                          DR_BASE_ADDRESS (d2.dr), 0)
> !         && data_ref_compare_tree (DR_OFFSET (d1.dr), DR_OFFSET (d2.dr)) == 0
> !         && data_ref_compare_tree (DR_INIT (d1.dr), DR_INIT (d2.dr)) == 0
> !         && data_ref_compare_tree (d1.seg_len, d2.seg_len) == 0
> !         && must_eq (d1.access_size, d2.access_size)
> !         && d1.align == d2.align);
>   }
>
>   /* Comparison function for sorting objects of dr_with_seg_len_pair_t
> *************** comp_dr_with_seg_len_pair (const void *p
> *** 1417,1423 ****
>
>   void
>   prune_runtime_alias_test_list (vec<dr_with_seg_len_pair_t> *alias_pairs,
> !                              poly_uint64 factor)
>   {
>     /* Sort the collected data ref pairs so that we can scan them once to
>        combine all possible aliasing checks.  */
> --- 1410,1416 ----
>
>   void
>   prune_runtime_alias_test_list (vec<dr_with_seg_len_pair_t> *alias_pairs,
> !                              poly_uint64)
>   {
>     /* Sort the collected data ref pairs so that we can scan them once to
>        combine all possible aliasing checks.  */
> *************** prune_runtime_alias_test_list (vec<dr_wi
> *** 1463,1468 ****
> --- 1456,1463 ----
>             }
>
>           poly_int64 init_a1, init_a2;
> +         /* Only consider cases in which the distance between the initial
> +            DR_A1 and the initial DR_A2 is known at compile time.  */
>           if (!operand_equal_p (DR_BASE_ADDRESS (dr_a1->dr),
>                                 DR_BASE_ADDRESS (dr_a2->dr), 0)
>               || !operand_equal_p (DR_OFFSET (dr_a1->dr),
> *************** prune_runtime_alias_test_list (vec<dr_wi
> *** 1482,1622 ****
>               std::swap (init_a1, init_a2);
>             }
>
> !         /* Only merge const step data references.  */
> !         poly_int64 step_a1, step_a2;
> !         if (!poly_int_tree_p (DR_STEP (dr_a1->dr), &step_a1)
> !             || !poly_int_tree_p (DR_STEP (dr_a2->dr), &step_a2))
> !           continue;
>
> !         bool neg_step = may_lt (step_a1, 0) || may_lt (step_a2, 0);
>
> !         /* DR_A1 and DR_A2 must go in the same direction.  */
> !         if (neg_step && (may_gt (step_a1, 0) || may_gt (step_a2, 0)))
> !           continue;
>
> !         poly_uint64 seg_len_a1 = 0, seg_len_a2 = 0;
> !         bool const_seg_len_a1 = poly_int_tree_p (dr_a1->seg_len,
> !                                                  &seg_len_a1);
> !         bool const_seg_len_a2 = poly_int_tree_p (dr_a2->seg_len,
> !                                                  &seg_len_a2);
> !
> !         /* We need to compute merged segment length at compilation time for
> !            dr_a1 and dr_a2, which is impossible if either one has non-const
> !            segment length.  */
> !         if ((!const_seg_len_a1 || !const_seg_len_a2)
> !             && may_ne (step_a1, step_a2))
> !           continue;
>
> !         bool do_remove = false;
> !         poly_uint64 diff = init_a2 - init_a1;
> !         poly_uint64 min_seg_len_b;
> !         tree new_seg_len;
>
> !         if (!poly_int_tree_p (dr_b1->seg_len, &min_seg_len_b))
>             {
> !             tree step_b = DR_STEP (dr_b1->dr);
> !             if (!tree_fits_shwi_p (step_b))
>                 continue;
> -             min_seg_len_b = factor * abs_hwi (tree_to_shwi (step_b));
> -           }
>
> !         /* Now we try to merge alias check dr_a1 & dr_b and dr_a2 & dr_b.
>
> !            Case A:
> !              check if the following condition is satisfied:
>
> !              DIFF - SEGMENT_LENGTH_A < SEGMENT_LENGTH_B
>
> !              where DIFF = DR_A2_INIT - DR_A1_INIT.  However,
> !              SEGMENT_LENGTH_A or SEGMENT_LENGTH_B may not be constant so we
> !              have to make a best estimation.  We can get the minimum value
> !              of SEGMENT_LENGTH_B as a constant, represented by 
> MIN_SEG_LEN_B,
> !              then either of the following two conditions can guarantee the
> !              one above:
> !
> !              1: DIFF <= MIN_SEG_LEN_B
> !              2: DIFF - SEGMENT_LENGTH_A < MIN_SEG_LEN_B
> !                 Because DIFF - SEGMENT_LENGTH_A is done in sizetype, we need
> !                 to take care of wrapping behavior in it.
> !
> !            Case B:
> !              If the left segment does not extend beyond the start of the
> !              right segment the new segment length is that of the right
> !              plus the segment distance.  The condition is like:
> !
> !              DIFF >= SEGMENT_LENGTH_A   ;SEGMENT_LENGTH_A is a constant.
> !
> !            Note 1: Case A.2 and B combined together effectively merges every
> !            dr_a1 & dr_b and dr_a2 & dr_b when SEGMENT_LENGTH_A is const.
> !
> !            Note 2: Above description is based on positive DR_STEP, we need 
> to
> !            take care of negative DR_STEP for wrapping behavior.  See PR80815
> !            for more information.  */
> !         if (neg_step)
> !           {
> !             /* Adjust diff according to access size of both references.  */
> !             diff += tree_to_poly_uint64
> !               (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a2->dr))));
> !             diff -= tree_to_poly_uint64
> !               (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a1->dr))));
> !             /* Case A.1.  */
> !             if (must_le (diff, min_seg_len_b)
> !                 /* Case A.2 and B combined.  */
> !                 || const_seg_len_a2)
> !               {
> !                 if (const_seg_len_a1 || const_seg_len_a2)
> !                   new_seg_len
> !                     = build_int_cstu (sizetype,
> !                                       lower_bound (seg_len_a1 - diff,
> !                                                    seg_len_a2));
> !                 else
> !                   new_seg_len
> !                     = size_binop (MINUS_EXPR, dr_a2->seg_len,
> !                                   build_int_cstu (sizetype, diff));
>
> !                 dr_a2->seg_len = new_seg_len;
> !                 do_remove = true;
> !               }
>             }
> -         else
> -           {
> -             /* Case A.1.  */
> -             if (must_le (diff, min_seg_len_b)
> -                 /* Case A.2 and B combined.  */
> -                 || const_seg_len_a1)
> -               {
> -                 if (const_seg_len_a1 && const_seg_len_a2)
> -                   new_seg_len
> -                     = build_int_cstu (sizetype,
> -                                       upper_bound (seg_len_a2 + diff,
> -                                                    seg_len_a1));
> -                 else
> -                   new_seg_len
> -                     = size_binop (PLUS_EXPR, dr_a2->seg_len,
> -                                   build_int_cstu (sizetype, diff));
>
> !                 dr_a1->seg_len = new_seg_len;
> !                 do_remove = true;
> !               }
> !           }
>
> !         if (do_remove)
>             {
> !             if (dump_enabled_p ())
> !               {
> !                 dump_printf (MSG_NOTE, "merging ranges for ");
> !                 dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a1->dr));
> !                 dump_printf (MSG_NOTE,  ", ");
> !                 dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b1->dr));
> !                 dump_printf (MSG_NOTE,  " and ");
> !                 dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a2->dr));
> !                 dump_printf (MSG_NOTE,  ", ");
> !                 dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b2->dr));
> !                 dump_printf (MSG_NOTE, "\n");
> !               }
> !             alias_pairs->ordered_remove (neg_step ? i - 1 : i);
> !             i--;
>             }
>         }
>       }
>   }
> --- 1477,1555 ----
>               std::swap (init_a1, init_a2);
>             }
>
> !         /* Work out what the segment length would be if we did combine
> !            DR_A1 and DR_A2:
>
> !            - If DR_A1 and DR_A2 have equal lengths, that length is
> !              also the combined length.
>
> !            - If DR_A1 and DR_A2 both have negative "lengths", the combined
> !              length is the lower bound on those lengths.
>
> !            - If DR_A1 and DR_A2 both have positive lengths, the combined
> !              length is the upper bound on those lengths.
>
> !            Other cases are unlikely to give a useful combination.
>
> !            The lengths both have sizetype, so the sign is taken from
> !            the step instead.  */
> !         if (!operand_equal_p (dr_a1->seg_len, dr_a2->seg_len, 0))
>             {
> !             poly_uint64 seg_len_a1, seg_len_a2;
> !             if (!poly_int_tree_p (dr_a1->seg_len, &seg_len_a1)
> !                 || !poly_int_tree_p (dr_a2->seg_len, &seg_len_a2))
>                 continue;
>
> !             tree indicator_a = dr_direction_indicator (dr_a1->dr);
> !             if (TREE_CODE (indicator_a) != INTEGER_CST)
> !               continue;
>
> !             tree indicator_b = dr_direction_indicator (dr_a2->dr);
> !             if (TREE_CODE (indicator_b) != INTEGER_CST)
> !               continue;
>
> !             int sign_a = tree_int_cst_sgn (indicator_a);
> !             int sign_b = tree_int_cst_sgn (indicator_b);
>
> !             poly_uint64 new_seg_len;
> !             if (sign_a <= 0 && sign_b <= 0)
> !               new_seg_len = lower_bound (seg_len_a1, seg_len_a2);
> !             else if (sign_a >= 0 && sign_b >= 0)
> !               new_seg_len = upper_bound (seg_len_a1, seg_len_a2);
> !             else
> !               continue;
>
> !             dr_a1->seg_len = build_int_cst (TREE_TYPE (dr_a1->seg_len),
> !                                             new_seg_len);
> !             dr_a1->align = MIN (dr_a1->align, known_alignment 
> (new_seg_len));
>             }
>
> !         /* This is always positive due to the swap above.  */
> !         poly_uint64 diff = init_a2 - init_a1;
>
> !         /* The new check will start at DR_A1.  Make sure that its access
> !            size encompasses the initial DR_A2.  */
> !         if (may_lt (dr_a1->access_size, diff + dr_a2->access_size))
>             {
> !             dr_a1->access_size = upper_bound (dr_a1->access_size,
> !                                               diff + dr_a2->access_size);
> !             unsigned int new_align = known_alignment (dr_a1->access_size);
> !             dr_a1->align = MIN (dr_a1->align, new_align);
>             }
> +         if (dump_enabled_p ())
> +           {
> +             dump_printf (MSG_NOTE, "merging ranges for ");
> +             dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a1->dr));
> +             dump_printf (MSG_NOTE,  ", ");
> +             dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b1->dr));
> +             dump_printf (MSG_NOTE,  " and ");
> +             dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a2->dr));
> +             dump_printf (MSG_NOTE,  ", ");
> +             dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b2->dr));
> +             dump_printf (MSG_NOTE, "\n");
> +           }
> +         alias_pairs->ordered_remove (i);
> +         i--;
>         }
>       }
>   }
> *************** create_intersect_range_checks_index (str
> *** 1656,1662 ****
>         || DR_NUM_DIMENSIONS (dr_a.dr) != DR_NUM_DIMENSIONS (dr_b.dr))
>       return false;
>
> !   if (!tree_fits_uhwi_p (dr_a.seg_len) || !tree_fits_uhwi_p (dr_b.seg_len))
>       return false;
>
>     if (!tree_fits_shwi_p (DR_STEP (dr_a.dr)))
> --- 1589,1597 ----
>         || DR_NUM_DIMENSIONS (dr_a.dr) != DR_NUM_DIMENSIONS (dr_b.dr))
>       return false;
>
> !   poly_uint64 seg_len1, seg_len2;
> !   if (!poly_int_tree_p (dr_a.seg_len, &seg_len1)
> !       || !poly_int_tree_p (dr_b.seg_len, &seg_len2))
>       return false;
>
>     if (!tree_fits_shwi_p (DR_STEP (dr_a.dr)))
> *************** create_intersect_range_checks_index (str
> *** 1671,1689 ****
>     gcc_assert (TREE_CODE (DR_STEP (dr_a.dr)) == INTEGER_CST);
>
>     bool neg_step = tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) 
> < 0;
> !   unsigned HOST_WIDE_INT abs_step
> !     = absu_hwi (tree_to_shwi (DR_STEP (dr_a.dr)));
>
> -   unsigned HOST_WIDE_INT seg_len1 = tree_to_uhwi (dr_a.seg_len);
> -   unsigned HOST_WIDE_INT seg_len2 = tree_to_uhwi (dr_b.seg_len);
>     /* Infer the number of iterations with which the memory segment is 
> accessed
>        by DR.  In other words, alias is checked if memory segment accessed by
>        DR_A in some iterations intersect with memory segment accessed by DR_B
>        in the same amount iterations.
>        Note segnment length is a linear function of number of iterations with
>        DR_STEP as the coefficient.  */
> !   unsigned HOST_WIDE_INT niter_len1 = (seg_len1 + abs_step - 1) / abs_step;
> !   unsigned HOST_WIDE_INT niter_len2 = (seg_len2 + abs_step - 1) / abs_step;
>
>     unsigned int i;
>     for (i = 0; i < DR_NUM_DIMENSIONS (dr_a.dr); i++)
> --- 1606,1647 ----
>     gcc_assert (TREE_CODE (DR_STEP (dr_a.dr)) == INTEGER_CST);
>
>     bool neg_step = tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) 
> < 0;
> !   unsigned HOST_WIDE_INT abs_step = tree_to_shwi (DR_STEP (dr_a.dr));
> !   if (neg_step)
> !     {
> !       abs_step = -abs_step;
> !       seg_len1 = -seg_len1;
> !       seg_len2 = -seg_len2;
> !     }
> !   else
> !     {
> !       /* Include the access size in the length, so that we only have one
> !        tree addition below.  */
> !       seg_len1 += dr_a.access_size;
> !       seg_len2 += dr_b.access_size;
> !     }
>
>     /* Infer the number of iterations with which the memory segment is 
> accessed
>        by DR.  In other words, alias is checked if memory segment accessed by
>        DR_A in some iterations intersect with memory segment accessed by DR_B
>        in the same amount iterations.
>        Note segnment length is a linear function of number of iterations with
>        DR_STEP as the coefficient.  */
> !   poly_uint64 niter_len1, niter_len2;
> !   if (!can_div_trunc_p (seg_len1 + abs_step - 1, abs_step, &niter_len1)
> !       || !can_div_trunc_p (seg_len2 + abs_step - 1, abs_step, &niter_len2))
> !     return false;
> !
> !   poly_uint64 niter_access1 = 0, niter_access2 = 0;
> !   if (neg_step)
> !     {
> !       /* Divide each access size by the byte step, rounding up.  */
> !       if (!can_div_trunc_p (dr_a.access_size - abs_step - 1,
> !                           abs_step, &niter_access1)
> !         || !can_div_trunc_p (dr_b.access_size + abs_step - 1,
> !                              abs_step, &niter_access2))
> !       return false;
> !     }
>
>     unsigned int i;
>     for (i = 0; i < DR_NUM_DIMENSIONS (dr_a.dr); i++)
> *************** create_intersect_range_checks_index (str
> *** 1734,1745 ****
>         /* Adjust ranges for negative step.  */
>         if (neg_step)
>         {
> !         min1 = fold_build2 (MINUS_EXPR, TREE_TYPE (min1), max1, idx_step);
> !         max1 = fold_build2 (MINUS_EXPR, TREE_TYPE (min1),
> !                             CHREC_LEFT (access1), idx_step);
> !         min2 = fold_build2 (MINUS_EXPR, TREE_TYPE (min2), max2, idx_step);
> !         max2 = fold_build2 (MINUS_EXPR, TREE_TYPE (min2),
> !                             CHREC_LEFT (access2), idx_step);
>         }
>         tree part_cond_expr
>         = fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
> --- 1692,1713 ----
>         /* Adjust ranges for negative step.  */
>         if (neg_step)
>         {
> !         /* IDX_LEN1 and IDX_LEN2 are negative in this case.  */
> !         std::swap (min1, max1);
> !         std::swap (min2, max2);
> !
> !         /* As with the lengths just calculated, we've measured the access
> !            sizes in iterations, so multiply them by the index step.  */
> !         tree idx_access1
> !           = fold_build2 (MULT_EXPR, TREE_TYPE (min1), idx_step,
> !                          build_int_cst (TREE_TYPE (min1), niter_access1));
> !         tree idx_access2
> !           = fold_build2 (MULT_EXPR, TREE_TYPE (min2), idx_step,
> !                          build_int_cst (TREE_TYPE (min2), niter_access2));
> !
> !         /* MINUS_EXPR because the above values are negative.  */
> !         max1 = fold_build2 (MINUS_EXPR, TREE_TYPE (max1), max1, 
> idx_access1);
> !         max2 = fold_build2 (MINUS_EXPR, TREE_TYPE (max2), max2, 
> idx_access2);
>         }
>         tree part_cond_expr
>         = fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
> *************** create_intersect_range_checks_index (str
> *** 1754,1759 ****
> --- 1722,1810 ----
>     return true;
>   }
>
> + /* If ALIGN is nonzero, set up *SEQ_MIN_OUT and *SEQ_MAX_OUT so that for
> +    every address ADDR accessed by D:
> +
> +      *SEQ_MIN_OUT <= ADDR (== ADDR & -ALIGN) <= *SEQ_MAX_OUT
> +
> +    In this case, every element accessed by D is aligned to at least
> +    ALIGN bytes.
> +
> +    If ALIGN is zero then instead set *SEG_MAX_OUT so that:
> +
> +      *SEQ_MIN_OUT <= ADDR < *SEQ_MAX_OUT.  */
> +
> + static void
> + get_segment_min_max (const dr_with_seg_len &d, tree *seg_min_out,
> +                    tree *seg_max_out, HOST_WIDE_INT align)
> + {
> +   /* Each access has the following pattern:
> +
> +         <- |seg_len| ->
> +         <--- A: -ve step --->
> +         +-----+-------+-----+-------+-----+
> +         | n-1 | ,.... |  0  | ..... | n-1 |
> +         +-----+-------+-----+-------+-----+
> +                       <--- B: +ve step --->
> +                       <- |seg_len| ->
> +                       |
> +                  base address
> +
> +      where "n" is the number of scalar iterations covered by the segment.
> +      (This should be VF for a particular pair if we know that both steps
> +      are the same, otherwise it will be the full number of scalar loop
> +      iterations.)
> +
> +      A is the range of bytes accessed when the step is negative,
> +      B is the range when the step is positive.
> +
> +      If the access size is "access_size" bytes, the lowest addressed byte 
> is:
> +
> +        base + (step < 0 ? seg_len : 0)   [LB]
> +
> +      and the highest addressed byte is always below:
> +
> +        base + (step < 0 ? 0 : seg_len) + access_size   [UB]
> +
> +      Thus:
> +
> +        LB <= ADDR < UB
> +
> +      If ALIGN is nonzero, all three values are aligned to at least ALIGN
> +      bytes, so:
> +
> +        LB <= ADDR <= UB - ALIGN
> +
> +      where "- ALIGN" folds naturally with the "+ access_size" and often
> +      cancels it out.
> +
> +      We don't try to simplify LB and UB beyond this (e.g. by using
> +      MIN and MAX based on whether seg_len rather than the stride is
> +      negative) because it is possible for the absolute size of the
> +      segment to overflow the range of a ssize_t.
> +
> +      Keeping the pointer_plus outside of the cond_expr should allow
> +      the cond_exprs to be shared with other alias checks.  */
> +   tree indicator = dr_direction_indicator (d.dr);
> +   tree neg_step = fold_build2 (LT_EXPR, boolean_type_node,
> +                              fold_convert (ssizetype, indicator),
> +                              ssize_int (0));
> +   tree addr_base = fold_build_pointer_plus (DR_BASE_ADDRESS (d.dr),
> +                                           DR_OFFSET (d.dr));
> +   addr_base = fold_build_pointer_plus (addr_base, DR_INIT (d.dr));
> +   tree seg_len = fold_convert (sizetype, d.seg_len);
> +
> +   tree min_reach = fold_build3 (COND_EXPR, sizetype, neg_step,
> +                               seg_len, size_zero_node);
> +   tree max_reach = fold_build3 (COND_EXPR, sizetype, neg_step,
> +                               size_zero_node, seg_len);
> +   max_reach = fold_build2 (PLUS_EXPR, sizetype, max_reach,
> +                          size_int (d.access_size - align));
> +
> +   *seg_min_out = fold_build_pointer_plus (addr_base, min_reach);
> +   *seg_max_out = fold_build_pointer_plus (addr_base, max_reach);
> + }
> +
>   /* Given two data references and segment lengths described by DR_A and DR_B,
>      create expression checking if the two addresses ranges intersect with
>      each other:
> *************** create_intersect_range_checks (struct lo
> *** 1770,1812 ****
>     if (create_intersect_range_checks_index (loop, cond_expr, dr_a, dr_b))
>       return;
>
> !   tree segment_length_a = dr_a.seg_len;
> !   tree segment_length_b = dr_b.seg_len;
> !   tree addr_base_a = DR_BASE_ADDRESS (dr_a.dr);
> !   tree addr_base_b = DR_BASE_ADDRESS (dr_b.dr);
> !   tree offset_a = DR_OFFSET (dr_a.dr), offset_b = DR_OFFSET (dr_b.dr);
> !
> !   offset_a = fold_build2 (PLUS_EXPR, TREE_TYPE (offset_a),
> !                         offset_a, DR_INIT (dr_a.dr));
> !   offset_b = fold_build2 (PLUS_EXPR, TREE_TYPE (offset_b),
> !                         offset_b, DR_INIT (dr_b.dr));
> !   addr_base_a = fold_build_pointer_plus (addr_base_a, offset_a);
> !   addr_base_b = fold_build_pointer_plus (addr_base_b, offset_b);
> !
> !   tree seg_a_min = addr_base_a;
> !   tree seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a);
> !   /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT
> !      bytes, e.g., int a[3] -> a[1] range is [a+4, a+16) instead of
> !      [a, a+12) */
> !   if (tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) < 0)
> !     {
> !       tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a.dr)));
> !       seg_a_min = fold_build_pointer_plus (seg_a_max, unit_size);
> !       seg_a_max = fold_build_pointer_plus (addr_base_a, unit_size);
> !     }
> !
> !   tree seg_b_min = addr_base_b;
> !   tree seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b);
> !   if (tree_int_cst_compare (DR_STEP (dr_b.dr), size_zero_node) < 0)
> !     {
> !       tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_b.dr)));
> !       seg_b_min = fold_build_pointer_plus (seg_b_max, unit_size);
> !       seg_b_max = fold_build_pointer_plus (addr_base_b, unit_size);
>       }
>     *cond_expr
>       = fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
> !       fold_build2 (LE_EXPR, boolean_type_node, seg_a_max, seg_b_min),
> !       fold_build2 (LE_EXPR, boolean_type_node, seg_b_max, seg_a_min));
>   }
>
>   /* Create a conditional expression that represents the run-time checks for
> --- 1821,1868 ----
>     if (create_intersect_range_checks_index (loop, cond_expr, dr_a, dr_b))
>       return;
>
> !   unsigned HOST_WIDE_INT min_align;
> !   tree_code cmp_code;
> !   if (TREE_CODE (DR_STEP (dr_a.dr)) == INTEGER_CST
> !       && TREE_CODE (DR_STEP (dr_b.dr)) == INTEGER_CST)
> !     {
> !       /* In this case adding access_size to seg_len is likely to give
> !        a simple X * step, where X is either the number of scalar
> !        iterations or the vectorization factor.  We're better off
> !        keeping that, rather than subtracting an alignment from it.
> !
> !        In this case the maximum values are exclusive and so there is
> !        no alias if the maximum of one segment equals the minimum
> !        of another.  */
> !       min_align = 0;
> !       cmp_code = LE_EXPR;
>       }
> +   else
> +     {
> +       /* Calculate the minimum alignment shared by all four pointers,
> +        then arrange for this alignment to be subtracted from the
> +        exclusive maximum values to get inclusive maximum values.
> +        This "- min_align" is cumulative with a "+ access_size"
> +        in the calculation of the maximum values.  In the best
> +        (and common) case, the two cancel each other out, leaving
> +        us with an inclusive bound based only on seg_len.  In the
> +        worst case we're simply adding a smaller number than before.
> +
> +        Because the maximum values are inclusive, there is an alias
> +        if the maximum value of one segment is equal to the minimum
> +        value of the other.  */
> +       min_align = MIN (dr_a.align, dr_b.align);
> +       cmp_code = LT_EXPR;
> +     }
> +
> +   tree seg_a_min, seg_a_max, seg_b_min, seg_b_max;
> +   get_segment_min_max (dr_a, &seg_a_min, &seg_a_max, min_align);
> +   get_segment_min_max (dr_b, &seg_b_min, &seg_b_max, min_align);
> +
>     *cond_expr
>       = fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
> !       fold_build2 (cmp_code, boolean_type_node, seg_a_max, seg_b_min),
> !       fold_build2 (cmp_code, boolean_type_node, seg_b_max, seg_a_min));
>   }
>
>   /* Create a conditional expression that represents the run-time checks for
> *************** free_data_refs (vec<data_reference_p> da
> *** 5277,5279 ****
> --- 5333,5422 ----
>       free_data_ref (dr);
>     datarefs.release ();
>   }
> +
> + /* Common routine implementing both dr_direction_indicator and
> +    dr_zero_step_indicator.  Return USEFUL_MIN if the indicator is known
> +    to be >= USEFUL_MIN and -1 if the indicator is known to be negative.
> +    Return the step as the indicator otherwise.  */
> +
> + static tree
> + dr_step_indicator (struct data_reference *dr, int useful_min)
> + {
> +   tree step = DR_STEP (dr);
> +   STRIP_NOPS (step);
> +   /* Look for cases where the step is scaled by a positive constant
> +      integer, which will often be the access size.  If the multiplication
> +      doesn't change the sign (due to overflow effects) then we can
> +      test the unscaled value instead.  */
> +   if (TREE_CODE (step) == MULT_EXPR
> +       && TREE_CODE (TREE_OPERAND (step, 1)) == INTEGER_CST
> +       && tree_int_cst_sgn (TREE_OPERAND (step, 1)) > 0)
> +     {
> +       tree factor = TREE_OPERAND (step, 1);
> +       step = TREE_OPERAND (step, 0);
> +
> +       /* Strip widening and truncating conversions as well as nops.  */
> +       if (CONVERT_EXPR_P (step)
> +         && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (step, 0))))
> +       step = TREE_OPERAND (step, 0);
> +       tree type = TREE_TYPE (step);
> +
> +       /* Get the range of step values that would not cause overflow.  */
> +       widest_int minv = (wi::to_widest (TYPE_MIN_VALUE (ssizetype))
> +                        / wi::to_widest (factor));
> +       widest_int maxv = (wi::to_widest (TYPE_MAX_VALUE (ssizetype))
> +                        / wi::to_widest (factor));
> +
> +       /* Get the range of values that the unconverted step actually has.  */
> +       wide_int step_min, step_max;
> +       if (TREE_CODE (step) != SSA_NAME
> +         || get_range_info (step, &step_min, &step_max) != VR_RANGE)
> +       {
> +         step_min = wi::to_wide (TYPE_MIN_VALUE (type));
> +         step_max = wi::to_wide (TYPE_MAX_VALUE (type));
> +       }
> +
> +       /* Check whether the unconverted step has an acceptable range.  */
> +       signop sgn = TYPE_SIGN (type);
> +       if (wi::les_p (minv, widest_int::from (step_min, sgn))
> +         && wi::ges_p (maxv, widest_int::from (step_max, sgn)))
> +       {
> +         if (wi::ge_p (step_min, useful_min, sgn))
> +           return ssize_int (useful_min);
> +         else if (wi::lt_p (step_max, 0, sgn))
> +           return ssize_int (-1);
> +         else
> +           return fold_convert (ssizetype, step);
> +       }
> +     }
> +   return DR_STEP (dr);
> + }
> +
> + /* Return a value that is negative iff DR has a negative step.  */
> +
> + tree
> + dr_direction_indicator (struct data_reference *dr)
> + {
> +   return dr_step_indicator (dr, 0);
> + }
> +
> + /* Return a value that is zero iff DR has a zero step.  */
> +
> + tree
> + dr_zero_step_indicator (struct data_reference *dr)
> + {
> +   return dr_step_indicator (dr, 1);
> + }
> +
> + /* Return true if DR is known to have a nonnegative (but possibly zero)
> +    step.  */
> +
> + bool
> + dr_known_forward_stride_p (struct data_reference *dr)
> + {
> +   tree indicator = dr_direction_indicator (dr);
> +   tree neg_step_val = fold_binary (LT_EXPR, boolean_type_node,
> +                                  fold_convert (ssizetype, indicator),
> +                                  ssize_int (0));
> +   return neg_step_val && integer_zerop (neg_step_val);
> + }
> Index: gcc/tree-loop-distribution.c
> ===================================================================
> *** gcc/tree-loop-distribution.c        2017-11-17 21:44:41.257891095 +0000
> --- gcc/tree-loop-distribution.c        2017-11-17 22:11:53.270650578 +0000
> *************** break_alias_scc_partitions (struct graph
> *** 2324,2339 ****
>   static tree
>   data_ref_segment_size (struct data_reference *dr, tree niters)
>   {
> !   tree segment_length;
> !
> !   if (integer_zerop (DR_STEP (dr)))
> !     segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr)));
> !   else
> !     segment_length = size_binop (MULT_EXPR,
> !                                fold_convert (sizetype, DR_STEP (dr)),
> !                                fold_convert (sizetype, niters));
> !
> !   return segment_length;
>   }
>
>   /* Return true if LOOP's latch is dominated by statement for data reference
> --- 2324,2335 ----
>   static tree
>   data_ref_segment_size (struct data_reference *dr, tree niters)
>   {
> !   niters = size_binop (MINUS_EXPR,
> !                      fold_convert (sizetype, niters),
> !                      size_one_node);
> !   return size_binop (MULT_EXPR,
> !                    fold_convert (sizetype, DR_STEP (dr)),
> !                    fold_convert (sizetype, niters));
>   }
>
>   /* Return true if LOOP's latch is dominated by statement for data reference
> *************** compute_alias_check_pairs (struct loop *
> *** 2388,2396 ****
>         else
>         seg_length_b = data_ref_segment_size (dr_b, niters);
>
>         dr_with_seg_len_pair_t dr_with_seg_len_pair
> !         (dr_with_seg_len (dr_a, seg_length_a),
> !          dr_with_seg_len (dr_b, seg_length_b));
>
>         /* Canonicalize pairs by sorting the two DR members.  */
>         if (comp_res > 0)
> --- 2384,2399 ----
>         else
>         seg_length_b = data_ref_segment_size (dr_b, niters);
>
> +       unsigned HOST_WIDE_INT access_size_a
> +       = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a))));
> +       unsigned HOST_WIDE_INT access_size_b
> +       = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_b))));
> +       unsigned int align_a = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_a)));
> +       unsigned int align_b = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_b)));
> +
>         dr_with_seg_len_pair_t dr_with_seg_len_pair
> !       (dr_with_seg_len (dr_a, seg_length_a, access_size_a, align_a),
> !        dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b));
>
>         /* Canonicalize pairs by sorting the two DR members.  */
>         if (comp_res > 0)
> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> *** gcc/tree-vect-data-refs.c   2017-11-17 22:07:58.225016438 +0000
> --- gcc/tree-vect-data-refs.c   2017-11-17 22:11:53.272510088 +0000
> *************** vect_mark_for_runtime_alias_test (ddr_p
> *** 168,173 ****
> --- 168,217 ----
>     return true;
>   }
>
> + /* Record that loop LOOP_VINFO needs to check that VALUE is nonzero.  */
> +
> + static void
> + vect_check_nonzero_value (loop_vec_info loop_vinfo, tree value)
> + {
> +   vec<tree> checks = LOOP_VINFO_CHECK_NONZERO (loop_vinfo);
> +   for (unsigned int i = 0; i < checks.length(); ++i)
> +     if (checks[i] == value)
> +       return;
> +
> +   if (dump_enabled_p ())
> +     {
> +       dump_printf_loc (MSG_NOTE, vect_location, "need run-time check that 
> ");
> +       dump_generic_expr (MSG_NOTE, TDF_SLIM, value);
> +       dump_printf (MSG_NOTE, " is nonzero\n");
> +     }
> +   LOOP_VINFO_CHECK_NONZERO (loop_vinfo).safe_push (value);
> + }
> +
> + /* Return true if we know that the order of vectorized STMT_A and
> +    vectorized STMT_B will be the same as the order of STMT_A and STMT_B.
> +    At least one of the statements is a write.  */
> +
> + static bool
> + vect_preserves_scalar_order_p (gimple *stmt_a, gimple *stmt_b)
> + {
> +   stmt_vec_info stmtinfo_a = vinfo_for_stmt (stmt_a);
> +   stmt_vec_info stmtinfo_b = vinfo_for_stmt (stmt_b);
> +
> +   /* Single statements are always kept in their original order.  */
> +   if (!STMT_VINFO_GROUPED_ACCESS (stmtinfo_a)
> +       && !STMT_VINFO_GROUPED_ACCESS (stmtinfo_b))
> +     return true;
> +
> +   /* STMT_A and STMT_B belong to overlapping groups.  All loads in a
> +      group are emitted at the position of the first scalar load and all
> +      stores in a group are emitted at the position of the last scalar store.
> +      Thus writes will happen no earlier than their current position
> +      (but could happen later) while reads will happen no later than their
> +      current position (but could happen earlier).  Reordering is therefore
> +      only possible if the first access is a write.  */
> +   gimple *earlier_stmt = get_earlier_stmt (stmt_a, stmt_b);
> +   return !DR_IS_WRITE (STMT_VINFO_DATA_REF (vinfo_for_stmt (earlier_stmt)));
> + }
>
>   /* A subroutine of vect_analyze_data_ref_dependence.  Handle
>      DDR_COULD_BE_INDEPENDENT_P ddr DDR that has a known set of dependence
> *************** vect_analyze_data_ref_dependence (struct
> *** 413,434 ****
>                 ... = a[i];
>                 a[i+1] = ...;
>              where loads from the group interleave with the store.  */
> !         if (STMT_VINFO_GROUPED_ACCESS (stmtinfo_a)
> !             || STMT_VINFO_GROUPED_ACCESS (stmtinfo_b))
>             {
> !             gimple *earlier_stmt;
> !             earlier_stmt = get_earlier_stmt (DR_STMT (dra), DR_STMT (drb));
> !             if (DR_IS_WRITE
> !                   (STMT_VINFO_DATA_REF (vinfo_for_stmt (earlier_stmt))))
>                 {
>                   if (dump_enabled_p ())
>                     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> !                                    "READ_WRITE dependence in interleaving."
> !                                    "\n");
>                   return true;
>                 }
>             }
> -
>           continue;
>         }
>
> --- 457,483 ----
>                 ... = a[i];
>                 a[i+1] = ...;
>              where loads from the group interleave with the store.  */
> !         if (!vect_preserves_scalar_order_p (DR_STMT (dra), DR_STMT (drb)))
> !           {
> !             if (dump_enabled_p ())
> !               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> !                                "READ_WRITE dependence in interleaving.\n");
> !             return true;
> !           }
> !
> !         if (!loop->force_vectorize)
>             {
> !             tree indicator = dr_zero_step_indicator (dra);
> !             if (TREE_CODE (indicator) != INTEGER_CST)
> !               vect_check_nonzero_value (loop_vinfo, indicator);
> !             else if (integer_zerop (indicator))
>                 {
>                   if (dump_enabled_p ())
>                     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> !                                "access also has a zero step\n");
>                   return true;
>                 }
>             }
>           continue;
>         }
>
> *************** vect_analyze_data_ref_accesses (vec_info
> *** 3025,3062 ****
>
>   /* Function vect_vfa_segment_size.
>
> -    Create an expression that computes the size of segment
> -    that will be accessed for a data reference.  The functions takes into
> -    account that realignment loads may access one more vector.
> -
>      Input:
>        DR: The data reference.
>        LENGTH_FACTOR: segment length to consider.
>
> !    Return an expression whose value is the size of segment which will be
> !    accessed by DR.  */
>
>   static tree
>   vect_vfa_segment_size (struct data_reference *dr, tree length_factor)
>   {
> !   tree segment_length;
>
> !   if (integer_zerop (DR_STEP (dr)))
> !     segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr)));
> !   else
> !     segment_length = size_binop (MULT_EXPR,
> !                                fold_convert (sizetype, DR_STEP (dr)),
> !                                fold_convert (sizetype, length_factor));
>
> !   if (vect_supportable_dr_alignment (dr, false)
> !       == dr_explicit_realign_optimized)
>       {
> !       tree vector_size = TYPE_SIZE_UNIT
> !                         (STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT 
> (dr))));
> !
> !       segment_length = size_binop (PLUS_EXPR, segment_length, vector_size);
>       }
> !   return segment_length;
>   }
>
>   /* Function vect_no_alias_p.
> --- 3074,3130 ----
>
>   /* Function vect_vfa_segment_size.
>
>      Input:
>        DR: The data reference.
>        LENGTH_FACTOR: segment length to consider.
>
> !    Return a value suitable for the dr_with_seg_len::seg_len field.
> !    This is the "distance travelled" by the pointer from the first
> !    iteration in the segment to the last.  Note that it does not include
> !    the size of the access; in effect it only describes the first byte.  */
>
>   static tree
>   vect_vfa_segment_size (struct data_reference *dr, tree length_factor)
>   {
> !   length_factor = size_binop (MINUS_EXPR,
> !                             fold_convert (sizetype, length_factor),
> !                             size_one_node);
> !   return size_binop (MULT_EXPR, fold_convert (sizetype, DR_STEP (dr)),
> !                    length_factor);
> ! }
>
> ! /* Return a value that, when added to abs (vect_vfa_segment_size (dr)),
> !    gives the worst-case number of bytes covered by the segment.  */
>
> ! static unsigned HOST_WIDE_INT
> ! vect_vfa_access_size (data_reference *dr)
> ! {
> !   stmt_vec_info stmt_vinfo = vinfo_for_stmt (DR_STMT (dr));
> !   tree ref_type = TREE_TYPE (DR_REF (dr));
> !   unsigned HOST_WIDE_INT ref_size = tree_to_uhwi (TYPE_SIZE_UNIT 
> (ref_type));
> !   unsigned HOST_WIDE_INT access_size = ref_size;
> !   if (GROUP_FIRST_ELEMENT (stmt_vinfo))
>       {
> !       gcc_assert (GROUP_FIRST_ELEMENT (stmt_vinfo) == DR_STMT (dr));
> !       access_size *= GROUP_SIZE (stmt_vinfo) - GROUP_GAP (stmt_vinfo);
> !     }
> !   if (STMT_VINFO_VEC_STMT (stmt_vinfo)
> !       && (vect_supportable_dr_alignment (dr, false)
> !         == dr_explicit_realign_optimized))
> !     {
> !       /* We might access a full vector's worth.  */
> !       tree vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
> !       access_size += tree_to_uhwi (TYPE_SIZE_UNIT (vectype)) - ref_size;
>       }
> !   return access_size;
> ! }
> !
> ! /* Get the minimum alignment for all the scalar accesses that DR describes. 
>  */
> !
> ! static unsigned int
> ! vect_vfa_align (const data_reference *dr)
> ! {
> !   return TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr)));
>   }
>
>   /* Function vect_no_alias_p.
> *************** vect_vfa_segment_size (struct data_refer
> *** 3064,3076 ****
>      Given data references A and B with equal base and offset, see whether
>      the alias relation can be decided at compilation time.  Return 1 if
>      it can and the references alias, 0 if it can and the references do
> !    not alias, and -1 if we cannot decide at compile time.  SEGMENT_LENGTH_A
> !    and SEGMENT_LENGTH_B are the memory lengths accessed by A and B
> !    respectively.  */
>
>   static int
>   vect_compile_time_alias (struct data_reference *a, struct data_reference *b,
> !                        tree segment_length_a, tree segment_length_b)
>   {
>     poly_offset_int offset_a = wi::to_poly_offset (DR_INIT (a));
>     poly_offset_int offset_b = wi::to_poly_offset (DR_INIT (b));
> --- 3132,3146 ----
>      Given data references A and B with equal base and offset, see whether
>      the alias relation can be decided at compilation time.  Return 1 if
>      it can and the references alias, 0 if it can and the references do
> !    not alias, and -1 if we cannot decide at compile time.  SEGMENT_LENGTH_A,
> !    SEGMENT_LENGTH_B, ACCESS_SIZE_A and ACCESS_SIZE_B are the equivalent
> !    of dr_with_seg_len::{seg_len,access_size} for A and B.  */
>
>   static int
>   vect_compile_time_alias (struct data_reference *a, struct data_reference *b,
> !                        tree segment_length_a, tree segment_length_b,
> !                        unsigned HOST_WIDE_INT access_size_a,
> !                        unsigned HOST_WIDE_INT access_size_b)
>   {
>     poly_offset_int offset_a = wi::to_poly_offset (DR_INIT (a));
>     poly_offset_int offset_b = wi::to_poly_offset (DR_INIT (b));
> *************** vect_compile_time_alias (struct data_ref
> *** 3083,3100 ****
>     if (tree_int_cst_compare (DR_STEP (a), size_zero_node) < 0)
>       {
>         const_length_a = (-wi::to_poly_wide (segment_length_a)).force_uhwi ();
> !       offset_a = (offset_a + vect_get_scalar_dr_size (a)) - const_length_a;
>       }
>     else
>       const_length_a = tree_to_poly_uint64 (segment_length_a);
>     if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0)
>       {
>         const_length_b = (-wi::to_poly_wide (segment_length_b)).force_uhwi ();
> !       offset_b = (offset_b + vect_get_scalar_dr_size (b)) - const_length_b;
>       }
>     else
>       const_length_b = tree_to_poly_uint64 (segment_length_b);
>
>     if (ranges_must_overlap_p (offset_a, const_length_a,
>                              offset_b, const_length_b))
>       return 1;
> --- 3153,3173 ----
>     if (tree_int_cst_compare (DR_STEP (a), size_zero_node) < 0)
>       {
>         const_length_a = (-wi::to_poly_wide (segment_length_a)).force_uhwi ();
> !       offset_a = (offset_a + access_size_a) - const_length_a;
>       }
>     else
>       const_length_a = tree_to_poly_uint64 (segment_length_a);
>     if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0)
>       {
>         const_length_b = (-wi::to_poly_wide (segment_length_b)).force_uhwi ();
> !       offset_b = (offset_b + access_size_b) - const_length_b;
>       }
>     else
>       const_length_b = tree_to_poly_uint64 (segment_length_b);
>
> +   const_length_a += access_size_a;
> +   const_length_b += access_size_b;
> +
>     if (ranges_must_overlap_p (offset_a, const_length_a,
>                              offset_b, const_length_b))
>       return 1;
> *************** dependence_distance_ge_vf (data_dependen
> *** 3144,3149 ****
> --- 3217,3324 ----
>     return true;
>   }
>
> + /* Dump LOWER_BOUND using flags DUMP_KIND.  Dumps are known to be enabled.  
> */
> +
> + static void
> + dump_lower_bound (int dump_kind, const vec_lower_bound &lower_bound)
> + {
> +   dump_printf (dump_kind, "%s (", lower_bound.unsigned_p ? "unsigned" : 
> "abs");
> +   dump_generic_expr (dump_kind, TDF_SLIM, lower_bound.expr);
> +   dump_printf (dump_kind, ") >= ");
> +   dump_dec (dump_kind, lower_bound.min_value);
> + }
> +
> + /* Record that the vectorized loop requires the vec_lower_bound described
> +    by EXPR, UNSIGNED_P and MIN_VALUE.  */
> +
> + static void
> + vect_check_lower_bound (loop_vec_info loop_vinfo, tree expr, bool 
> unsigned_p,
> +                       poly_uint64 min_value)
> + {
> +   vec<vec_lower_bound> lower_bounds = LOOP_VINFO_LOWER_BOUNDS (loop_vinfo);
> +   for (unsigned int i = 0; i < lower_bounds.length (); ++i)
> +     if (operand_equal_p (lower_bounds[i].expr, expr, 0))
> +       {
> +       unsigned_p &= lower_bounds[i].unsigned_p;
> +       min_value = upper_bound (lower_bounds[i].min_value, min_value);
> +       if (lower_bounds[i].unsigned_p != unsigned_p
> +           || may_lt (lower_bounds[i].min_value, min_value))
> +         {
> +           lower_bounds[i].unsigned_p = unsigned_p;
> +           lower_bounds[i].min_value = min_value;
> +           if (dump_enabled_p ())
> +             {
> +               dump_printf_loc (MSG_NOTE, vect_location,
> +                                "updating run-time check to ");
> +               dump_lower_bound (MSG_NOTE, lower_bounds[i]);
> +               dump_printf (MSG_NOTE, "\n");
> +             }
> +         }
> +       return;
> +       }
> +
> +   vec_lower_bound lower_bound (expr, unsigned_p, min_value);
> +   if (dump_enabled_p ())
> +     {
> +       dump_printf_loc (MSG_NOTE, vect_location, "need a run-time check that 
> ");
> +       dump_lower_bound (MSG_NOTE, lower_bound);
> +       dump_printf (MSG_NOTE, "\n");
> +     }
> +   LOOP_VINFO_LOWER_BOUNDS (loop_vinfo).safe_push (lower_bound);
> + }
> +
> + /* Return true if it's unlikely that the step of the vectorized form of DR
> +    will span fewer than GAP bytes.  */
> +
> + static bool
> + vect_small_gap_p (loop_vec_info loop_vinfo, data_reference *dr, poly_int64 
> gap)
> + {
> +   stmt_vec_info stmt_info = vinfo_for_stmt (DR_STMT (dr));
> +   HOST_WIDE_INT count
> +     = estimated_poly_value (LOOP_VINFO_VECT_FACTOR (loop_vinfo));
> +   if (GROUP_FIRST_ELEMENT (stmt_info))
> +     count *= GROUP_SIZE (vinfo_for_stmt (GROUP_FIRST_ELEMENT (stmt_info)));
> +   return estimated_poly_value (gap) <= count * vect_get_scalar_dr_size (dr);
> + }
> +
> + /* Return true if we know that there is no alias between DR_A and DR_B
> +    when abs (DR_STEP (DR_A)) >= N for some N.  When returning true, set
> +    *LOWER_BOUND_OUT to this N.  */
> +
> + static bool
> + vectorizable_with_step_bound_p (data_reference *dr_a, data_reference *dr_b,
> +                               poly_uint64 *lower_bound_out)
> + {
> +   /* Check that there is a constant gap of known sign between DR_A
> +      and DR_B.  */
> +   poly_int64 init_a, init_b;
> +   if (!operand_equal_p (DR_BASE_ADDRESS (dr_a), DR_BASE_ADDRESS (dr_b), 0)
> +       || !operand_equal_p (DR_OFFSET (dr_a), DR_OFFSET (dr_b), 0)
> +       || !operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0)
> +       || !poly_int_tree_p (DR_INIT (dr_a), &init_a)
> +       || !poly_int_tree_p (DR_INIT (dr_b), &init_b)
> +       || !ordered_p (init_a, init_b))
> +     return false;
> +
> +   /* Sort DR_A and DR_B by the address they access.  */
> +   if (may_lt (init_b, init_a))
> +     {
> +       std::swap (init_a, init_b);
> +       std::swap (dr_a, dr_b);
> +     }
> +
> +   /* If the two accesses could be dependent within a scalar iteration,
> +      make sure that we'd retain their order.  */
> +   if (may_gt (init_a + vect_get_scalar_dr_size (dr_a), init_b)
> +       && !vect_preserves_scalar_order_p (DR_STMT (dr_a), DR_STMT (dr_b)))
> +     return false;
> +
> +   /* There is no alias if abs (DR_STEP) is greater than or equal to
> +      the bytes spanned by the combination of the two accesses.  */
> +   *lower_bound_out = init_b + vect_get_scalar_dr_size (dr_b) - init_a;
> +   return true;
> + }
> +
>   /* Function vect_prune_runtime_alias_test_list.
>
>      Prune a list of ddrs to be tested at run-time by versioning for alias.
> *************** vect_prune_runtime_alias_test_list (loop
> *** 3173,3178 ****
> --- 3348,3366 ----
>       dump_printf_loc (MSG_NOTE, vect_location,
>                        "=== vect_prune_runtime_alias_test_list ===\n");
>
> +   /* Step values are irrelevant for aliasing if the number of vector
> +      iterations is equal to the number of scalar iterations (which can
> +      happen for fully-SLP loops).  */
> +   bool ignore_step_p = must_eq (LOOP_VINFO_VECT_FACTOR (loop_vinfo), 1U);
> +
> +   if (!ignore_step_p)
> +     {
> +       /* Convert the checks for nonzero steps into bound tests.  */
> +       tree value;
> +       FOR_EACH_VEC_ELT (LOOP_VINFO_CHECK_NONZERO (loop_vinfo), i, value)
> +       vect_check_lower_bound (loop_vinfo, value, true, 1);
> +     }
> +
>     if (may_alias_ddrs.is_empty ())
>       return true;
>
> *************** vect_prune_runtime_alias_test_list (loop
> *** 3186,3194 ****
> --- 3374,3385 ----
>     FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr)
>       {
>         int comp_res;
> +       poly_uint64 lower_bound;
>         struct data_reference *dr_a, *dr_b;
>         gimple *dr_group_first_a, *dr_group_first_b;
>         tree segment_length_a, segment_length_b;
> +       unsigned HOST_WIDE_INT access_size_a, access_size_b;
> +       unsigned int align_a, align_b;
>         gimple *stmt_a, *stmt_b;
>
>         /* Ignore the alias if the VF we chose ended up being no greater
> *************** vect_prune_runtime_alias_test_list (loop
> *** 3216,3221 ****
> --- 3407,3470 ----
>
>         dr_a = DDR_A (ddr);
>         stmt_a = DR_STMT (DDR_A (ddr));
> +
> +       dr_b = DDR_B (ddr);
> +       stmt_b = DR_STMT (DDR_B (ddr));
> +
> +       /* Skip the pair if inter-iteration dependencies are irrelevant
> +        and intra-iteration dependencies are guaranteed to be honored.  */
> +       if (ignore_step_p
> +         && (vect_preserves_scalar_order_p (stmt_a, stmt_b)
> +             || vectorizable_with_step_bound_p (dr_a, dr_b, &lower_bound)))
> +       {
> +         if (dump_enabled_p ())
> +           {
> +             dump_printf_loc (MSG_NOTE, vect_location,
> +                              "no need for alias check between ");
> +             dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a));
> +             dump_printf (MSG_NOTE, " and ");
> +             dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b));
> +             dump_printf (MSG_NOTE, " when VF is 1\n");
> +           }
> +         continue;
> +       }
> +
> +       /* See whether we can handle the alias using a bounds check on
> +        the step, and whether that's likely to be the best approach.
> +        (It might not be, for example, if the minimum step is much larger
> +        than the number of bytes handled by one vector iteration.)  */
> +       if (!ignore_step_p
> +         && TREE_CODE (DR_STEP (dr_a)) != INTEGER_CST
> +         && vectorizable_with_step_bound_p (dr_a, dr_b, &lower_bound)
> +         && (vect_small_gap_p (loop_vinfo, dr_a, lower_bound)
> +             || vect_small_gap_p (loop_vinfo, dr_b, lower_bound)))
> +       {
> +         bool unsigned_p = dr_known_forward_stride_p (dr_a);
> +         if (dump_enabled_p ())
> +           {
> +             dump_printf_loc (MSG_NOTE, vect_location, "no alias between ");
> +             dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a));
> +             dump_printf (MSG_NOTE, " and ");
> +             dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b));
> +             dump_printf (MSG_NOTE, " when the step ");
> +             dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_STEP (dr_a));
> +             dump_printf (MSG_NOTE, " is outside ");
> +             if (unsigned_p)
> +               dump_printf (MSG_NOTE, "[0");
> +             else
> +               {
> +                 dump_printf (MSG_NOTE, "(");
> +                 dump_dec (MSG_NOTE, poly_int64 (-lower_bound));
> +               }
> +             dump_printf (MSG_NOTE, ", ");
> +             dump_dec (MSG_NOTE, lower_bound);
> +             dump_printf (MSG_NOTE, ")\n");
> +           }
> +         vect_check_lower_bound (loop_vinfo, DR_STEP (dr_a), unsigned_p,
> +                                 lower_bound);
> +         continue;
> +       }
> +
>         dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a));
>         if (dr_group_first_a)
>         {
> *************** vect_prune_runtime_alias_test_list (loop
> *** 3223,3230 ****
>           dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a));
>         }
>
> -       dr_b = DDR_B (ddr);
> -       stmt_b = DR_STMT (DDR_B (ddr));
>         dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b));
>         if (dr_group_first_b)
>         {
> --- 3472,3477 ----
> *************** vect_prune_runtime_alias_test_list (loop
> *** 3232,3243 ****
>           dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b));
>         }
>
> !       if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0))
> !       length_factor = scalar_loop_iters;
>         else
> !       length_factor = size_int (vect_factor);
> !       segment_length_a = vect_vfa_segment_size (dr_a, length_factor);
> !       segment_length_b = vect_vfa_segment_size (dr_b, length_factor);
>
>         comp_res = data_ref_compare_tree (DR_BASE_ADDRESS (dr_a),
>                                         DR_BASE_ADDRESS (dr_b));
> --- 3479,3502 ----
>           dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b));
>         }
>
> !       if (ignore_step_p)
> !       {
> !         segment_length_a = size_zero_node;
> !         segment_length_b = size_zero_node;
> !       }
>         else
> !       {
> !         if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0))
> !           length_factor = scalar_loop_iters;
> !         else
> !           length_factor = size_int (vect_factor);
> !         segment_length_a = vect_vfa_segment_size (dr_a, length_factor);
> !         segment_length_b = vect_vfa_segment_size (dr_b, length_factor);
> !       }
> !       access_size_a = vect_vfa_access_size (dr_a);
> !       access_size_b = vect_vfa_access_size (dr_b);
> !       align_a = vect_vfa_align (dr_a);
> !       align_b = vect_vfa_align (dr_b);
>
>         comp_res = data_ref_compare_tree (DR_BASE_ADDRESS (dr_a),
>                                         DR_BASE_ADDRESS (dr_b));
> *************** vect_prune_runtime_alias_test_list (loop
> *** 3254,3260 ****
>         {
>           int res = vect_compile_time_alias (dr_a, dr_b,
>                                              segment_length_a,
> !                                            segment_length_b);
>           if (res == 0)
>             continue;
>
> --- 3513,3534 ----
>         {
>           int res = vect_compile_time_alias (dr_a, dr_b,
>                                              segment_length_a,
> !                                            segment_length_b,
> !                                            access_size_a,
> !                                            access_size_b);
> !         if (res >= 0 && dump_enabled_p ())
> !           {
> !             dump_printf_loc (MSG_NOTE, vect_location,
> !                              "can tell at compile time that ");
> !             dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a));
> !             dump_printf (MSG_NOTE, " and ");
> !             dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b));
> !             if (res == 0)
> !               dump_printf (MSG_NOTE, " do not alias\n");
> !             else
> !               dump_printf (MSG_NOTE, " alias\n");
> !           }
> !
>           if (res == 0)
>             continue;
>
> *************** vect_prune_runtime_alias_test_list (loop
> *** 3268,3275 ****
>         }
>
>         dr_with_seg_len_pair_t dr_with_seg_len_pair
> !         (dr_with_seg_len (dr_a, segment_length_a),
> !          dr_with_seg_len (dr_b, segment_length_b));
>
>         /* Canonicalize pairs by sorting the two DR members.  */
>         if (comp_res > 0)
> --- 3542,3549 ----
>         }
>
>         dr_with_seg_len_pair_t dr_with_seg_len_pair
> !       (dr_with_seg_len (dr_a, segment_length_a, access_size_a, align_a),
> !        dr_with_seg_len (dr_b, segment_length_b, access_size_b, align_b));
>
>         /* Canonicalize pairs by sorting the two DR members.  */
>         if (comp_res > 0)
> *************** vect_prune_runtime_alias_test_list (loop
> *** 3282,3287 ****
> --- 3556,3562 ----
>
>     unsigned int count = (comp_alias_ddrs.length ()
>                         + check_unequal_addrs.length ());
> +
>     dump_printf_loc (MSG_NOTE, vect_location,
>                    "improved number of alias checks from %d to %d\n",
>                    may_alias_ddrs.length (), count);
> Index: gcc/tree-vect-loop-manip.c
> ===================================================================
> *** gcc/tree-vect-loop-manip.c  2017-11-17 21:48:20.590673490 +0000
> --- gcc/tree-vect-loop-manip.c  2017-11-17 22:11:53.272510088 +0000
> *************** vect_create_cond_for_unequal_addrs (loop
> *** 2851,2856 ****
> --- 2851,2881 ----
>       }
>   }
>
> + /* Create an expression that is true when all lower-bound conditions for
> +    the vectorized loop are met.  Chain this condition with *COND_EXPR.  */
> +
> + static void
> + vect_create_cond_for_lower_bounds (loop_vec_info loop_vinfo, tree 
> *cond_expr)
> + {
> +   vec<vec_lower_bound> lower_bounds = LOOP_VINFO_LOWER_BOUNDS (loop_vinfo);
> +   for (unsigned int i = 0; i < lower_bounds.length (); ++i)
> +     {
> +       tree expr = lower_bounds[i].expr;
> +       tree type = unsigned_type_for (TREE_TYPE (expr));
> +       expr = fold_convert (type, expr);
> +       poly_uint64 bound = lower_bounds[i].min_value;
> +       if (!lower_bounds[i].unsigned_p)
> +       {
> +         expr = fold_build2 (PLUS_EXPR, type, expr,
> +                             build_int_cstu (type, bound - 1));
> +         bound += bound - 1;
> +       }
> +       tree part_cond_expr = fold_build2 (GE_EXPR, boolean_type_node, expr,
> +                                        build_int_cstu (type, bound));
> +       chain_cond_expr (cond_expr, part_cond_expr);
> +     }
> + }
> +
>   /* Function vect_create_cond_for_alias_checks.
>
>      Create a conditional expression that represents the run-time checks for
> *************** vect_loop_versioning (loop_vec_info loop
> *** 2962,2967 ****
> --- 2987,2993 ----
>     if (version_alias)
>       {
>         vect_create_cond_for_unequal_addrs (loop_vinfo, &cond_expr);
> +       vect_create_cond_for_lower_bounds (loop_vinfo, &cond_expr);
>         vect_create_cond_for_alias_checks (loop_vinfo, &cond_expr);
>       }
>
> Index: gcc/tree-vect-loop.c
> ===================================================================
> *** gcc/tree-vect-loop.c        2017-11-17 21:48:20.646463984 +0000
> --- gcc/tree-vect-loop.c        2017-11-17 22:11:53.273439843 +0000
> *************** vect_analyze_loop_2 (loop_vec_info loop_
> *** 2472,2477 ****
> --- 2472,2478 ----
>         }
>       }
>     /* Free optimized alias test DDRS.  */
> +   LOOP_VINFO_LOWER_BOUNDS (loop_vinfo).truncate (0);
>     LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo).release ();
>     LOOP_VINFO_CHECK_UNEQUAL_ADDRS (loop_vinfo).release ();
>     /* Reset target cost data.  */
> *************** vect_estimate_min_profitable_iters (loop
> *** 3668,3673 ****
> --- 3669,3686 ----
>         /* Count LEN - 1 ANDs and LEN comparisons.  */
>         (void) add_stmt_cost (target_cost_data, len * 2 - 1, scalar_stmt,
>                               NULL, 0, vect_prologue);
> +       len = LOOP_VINFO_LOWER_BOUNDS (loop_vinfo).length ();
> +       if (len)
> +       {
> +         /* Count LEN - 1 ANDs and LEN comparisons.  */
> +         unsigned int nstmts = len * 2 - 1;
> +         /* +1 for each bias that needs adding.  */
> +         for (unsigned int i = 0; i < len; ++i)
> +           if (!LOOP_VINFO_LOWER_BOUNDS (loop_vinfo)[i].unsigned_p)
> +             nstmts += 1;
> +         (void) add_stmt_cost (target_cost_data, nstmts, scalar_stmt,
> +                               NULL, 0, vect_prologue);
> +       }
>         dump_printf (MSG_NOTE,
>                      "cost model: Adding cost of checks for loop "
>                      "versioning aliasing.\n");
> Index: gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c   2017-11-09 15:15:06.847657980 
> +0000
> --- gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c   2017-11-17 22:11:53.266001803 
> +0000
> *************** int main ()
> *** 45,54 ****
>     return 0;
>   }
>
> ! /* Basic blocks of if-converted loops are vectorized from within the loop
> !    vectorizer pass.  In this case it is really a deficiency in loop
> !    vectorization data dependence analysis that causes us to require
> !    basic block vectorization in the first place.  */
> !
> ! /* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "vect" { 
> target vect_element_align } } } */
>
> --- 45,50 ----
>     return 0;
>   }
>
> ! /* { dg-final { scan-tree-dump {(no need for alias check [^\n]* when VF is 
> 1|no alias between [^\n]* when [^\n]* is outside \(-16, 16\))} "vect" { 
> target vect_element_align } } } */
> ! /* { dg-final { scan-tree-dump-times "loop vectorized" 1 "vect" { target 
> vect_element_align } } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-10.c
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gcc.dg/vect/vect-alias-check-10.c     2017-11-17 
> 22:11:53.266001803 +0000
> ***************
> *** 0 ****
> --- 1,69 ----
> + /* { dg-do run } */
> +
> + #define N 87
> + #define M 6
> +
> + typedef signed char sc;
> + typedef unsigned char uc;
> + typedef signed short ss;
> + typedef unsigned short us;
> + typedef int si;
> + typedef unsigned int ui;
> + typedef signed long long sll;
> + typedef unsigned long long ull;
> +
> + #define FOR_EACH_TYPE(M) \
> +   M (sc) M (uc) \
> +   M (ss) M (us) \
> +   M (si) M (ui) \
> +   M (sll) M (ull) \
> +   M (float) M (double)
> +
> + #define TEST_VALUE(I) ((I) * 5 / 2)
> +
> + #define ADD_TEST(TYPE)                                \
> +   void __attribute__((noinline, noclone))     \
> +   test_##TYPE (TYPE *a, int step)             \
> +   {                                           \
> +     for (int i = 0; i < N; ++i)                       \
> +       {                                               \
> +       a[i * step + 0] = a[i * step + 0] + 1;  \
> +       a[i * step + 1] = a[i * step + 1] + 2;  \
> +       a[i * step + 2] = a[i * step + 2] + 4;  \
> +       a[i * step + 3] = a[i * step + 3] + 8;  \
> +       }                                               \
> +   }                                           \
> +   void __attribute__((noinline, noclone))     \
> +   ref_##TYPE (TYPE *a, int step)              \
> +   {                                           \
> +     for (int i = 0; i < N; ++i)                       \
> +       {                                               \
> +       a[i * step + 0] = a[i * step + 0] + 1;  \
> +       a[i * step + 1] = a[i * step + 1] + 2;  \
> +       a[i * step + 2] = a[i * step + 2] + 4;  \
> +       a[i * step + 3] = a[i * step + 3] + 8;  \
> +       asm volatile ("");                      \
> +       }                                               \
> +   }
> +
> + #define DO_TEST(TYPE)                                 \
> +   for (int j = -M; j <= M; ++j)                               \
> +     {                                                 \
> +       TYPE a[N * M], b[N * M];                                \
> +       for (int i = 0; i < N * M; ++i)                 \
> +       a[i] = b[i] = TEST_VALUE (i);                   \
> +       int offset = (j < 0 ? N * M - 4 : 0);           \
> +       test_##TYPE (a + offset, j);                    \
> +       ref_##TYPE (b + offset, j);                     \
> +       if (__builtin_memcmp (a, b, sizeof (a)) != 0)   \
> +       __builtin_abort ();                             \
> +     }
> +
> + FOR_EACH_TYPE (ADD_TEST)
> +
> + int
> + main (void)
> + {
> +   FOR_EACH_TYPE (DO_TEST)
> +   return 0;
> + }
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-11.c
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gcc.dg/vect/vect-alias-check-11.c     2017-11-17 
> 22:11:53.266001803 +0000
> ***************
> *** 0 ****
> --- 1,99 ----
> + /* { dg-do run } */
> +
> + #define N 87
> + #define M 6
> +
> + typedef signed char sc;
> + typedef unsigned char uc;
> + typedef signed short ss;
> + typedef unsigned short us;
> + typedef int si;
> + typedef unsigned int ui;
> + typedef signed long long sll;
> + typedef unsigned long long ull;
> +
> + #define FOR_EACH_TYPE(M) \
> +   M (sc) M (uc) \
> +   M (ss) M (us) \
> +   M (si) M (ui) \
> +   M (sll) M (ull) \
> +   M (float) M (double)
> +
> + #define TEST_VALUE1(I) ((I) * 5 / 2)
> + #define TEST_VALUE2(I) ((I) * 11 / 5)
> +
> + #define ADD_TEST(TYPE)                                        \
> +   void __attribute__((noinline, noclone))             \
> +   test_##TYPE (TYPE *restrict a, TYPE *restrict b,    \
> +              int step)                                \
> +   {                                                   \
> +     for (int i = 0; i < N; ++i)                               \
> +       {                                                       \
> +       TYPE r1 = a[i * step + 0] += 1;                 \
> +       a[i * step + 1] += 2;                           \
> +       a[i * step + 2] += 4;                           \
> +       a[i * step + 3] += 8;                           \
> +       b[i] += r1;                                     \
> +       }                                                       \
> +   }                                                   \
> +                                                       \
> +   void __attribute__((noinline, noclone))             \
> +   ref_##TYPE (TYPE *restrict a, TYPE *restrict b,     \
> +             int step)                                 \
> +   {                                                   \
> +     for (int i = 0; i < N; ++i)                               \
> +       {                                                       \
> +       TYPE r1 = a[i * step + 0] += 1;                 \
> +       a[i * step + 1] += 2;                           \
> +       a[i * step + 2] += 4;                           \
> +       a[i * step + 3] += 8;                           \
> +       b[i] += r1;                                     \
> +       asm volatile ("");                              \
> +       }                                                       \
> +   }
> +
> + #define DO_TEST(TYPE)                                 \
> +   for (int j = -M; j <= M; ++j)                               \
> +     {                                                 \
> +       TYPE a1[N * M], a2[N * M], b1[N], b2[N];                \
> +       for (int i = 0; i < N * M; ++i)                 \
> +       a1[i] = a2[i] = TEST_VALUE1 (i);                \
> +       for (int i = 0; i < N; ++i)                     \
> +       b1[i] = b2[i] = TEST_VALUE2 (i);                \
> +       int offset = (j < 0 ? N * M - 4 : 0);           \
> +       test_##TYPE (a1 + offset, b1, j);                       \
> +       ref_##TYPE (a2 + offset, b2, j);                        \
> +       if (__builtin_memcmp (a1, a2, sizeof (a1)) != 0)        \
> +       __builtin_abort ();                             \
> +       if (__builtin_memcmp (b1, b2, sizeof (b1)) != 0)        \
> +       __builtin_abort ();                             \
> +     }
> +
> + FOR_EACH_TYPE (ADD_TEST)
> +
> + int
> + main (void)
> + {
> +   FOR_EACH_TYPE (DO_TEST)
> +   return 0;
> + }
> +
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ 
> ]* is outside \(-2, 2\)} "vect" { target vect_int } } } */
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ 
> ]* is outside \(-3, 3\)} "vect" { target vect_int } } } */
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ 
> ]* is outside \(-4, 4\)} "vect" { target vect_int } } } */
> + /* { dg-final { scan-tree-dump {run-time check [^\n]* abs \([^*]*\) >= 4} 
> "vect" { target vect_int } } } */
> +
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ 
> ]* \* 2[)]* is outside \(-4, 4\)} "vect" { target vect_int } } } */
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ 
> ]* \* 2[)]* is outside \(-6, 6\)} "vect" { target vect_int } } } */
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ 
> ]* \* 2[)]* is outside \(-8, 8\)} "vect" { target vect_int } } } */
> + /* { dg-final { scan-tree-dump {run-time check [^\n]* abs \([^*]* \* 2[)]* 
> >= 8} "vect" { target vect_int } } } */
> +
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ 
> ]* \* 4[)]* is outside \(-8, 8\)} "vect" { target { vect_int || vect_float } 
> } } } */
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ 
> ]* \* 4[)]* is outside \(-12, 12\)} "vect" { target { vect_int || vect_float 
> } } } } */
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ 
> ]* \* 4[)]* is outside \(-16, 16\)} "vect" { target { vect_int || vect_float 
> } } } } */
> + /* { dg-final { scan-tree-dump {run-time check [^\n]* abs \([^*]* \* 4[)]* 
> >= 16} "vect" { target { vect_int || vect_float } } } } */
> +
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ 
> ]* \* 8[)]* is outside \(-16, 16\)} "vect" { target vect_double } } } */
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ 
> ]* \* 8[)]* is outside \(-24, 24\)} "vect" { target vect_double } } } */
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ 
> ]* \* 8[)]* is outside \(-32, 32\)} "vect" { target vect_double } } } */
> + /* { dg-final { scan-tree-dump {run-time check [^\n]* abs \([^*]* \* 8[)]* 
> >= 32} "vect" { target vect_double } } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-12.c
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gcc.dg/vect/vect-alias-check-12.c     2017-11-17 
> 22:11:53.266001803 +0000
> ***************
> *** 0 ****
> --- 1,99 ----
> + /* { dg-do run } */
> +
> + #define N 87
> + #define M 7
> +
> + typedef signed char sc;
> + typedef unsigned char uc;
> + typedef signed short ss;
> + typedef unsigned short us;
> + typedef int si;
> + typedef unsigned int ui;
> + typedef signed long long sll;
> + typedef unsigned long long ull;
> +
> + #define FOR_EACH_TYPE(M) \
> +   M (sc) M (uc) \
> +   M (ss) M (us) \
> +   M (si) M (ui) \
> +   M (sll) M (ull) \
> +   M (float) M (double)
> +
> + #define TEST_VALUE1(I) ((I) * 5 / 2)
> + #define TEST_VALUE2(I) ((I) * 11 / 5)
> +
> + #define ADD_TEST(TYPE)                                        \
> +   void __attribute__((noinline, noclone))             \
> +   test_##TYPE (TYPE *restrict a, TYPE *restrict b,    \
> +              int step)                                \
> +   {                                                   \
> +     step = step & M;                                  \
> +     for (int i = 0; i < N; ++i)                               \
> +       {                                                       \
> +       TYPE r1 = a[i * step + 0] += 1;                 \
> +       a[i * step + 1] += 2;                           \
> +       a[i * step + 2] += 4;                           \
> +       a[i * step + 3] += 8;                           \
> +       b[i] += r1;                                     \
> +       }                                                       \
> +   }                                                   \
> +                                                       \
> +   void __attribute__((noinline, noclone))             \
> +   ref_##TYPE (TYPE *restrict a, TYPE *restrict b,     \
> +             int step)                                 \
> +   {                                                   \
> +     for (unsigned short i = 0; i < N; ++i)            \
> +       {                                                       \
> +       TYPE r1 = a[i * step + 0] += 1;                 \
> +       a[i * step + 1] += 2;                           \
> +       a[i * step + 2] += 4;                           \
> +       a[i * step + 3] += 8;                           \
> +       b[i] += r1;                                     \
> +       asm volatile ("");                              \
> +       }                                                       \
> +   }
> +
> + #define DO_TEST(TYPE)                                 \
> +   for (int j = 0; j <= M; ++j)                                \
> +     {                                                 \
> +       TYPE a1[N * M], a2[N * M], b1[N], b2[N];                \
> +       for (int i = 0; i < N * M; ++i)                 \
> +       a1[i] = a2[i] = TEST_VALUE1 (i);                \
> +       for (int i = 0; i < N; ++i)                     \
> +       b1[i] = b2[i] = TEST_VALUE2 (i);                \
> +       test_##TYPE (a1, b1, j);                                \
> +       ref_##TYPE (a2, b2, j);                         \
> +       if (__builtin_memcmp (a1, a2, sizeof (a1)) != 0)        \
> +       __builtin_abort ();                             \
> +       if (__builtin_memcmp (b1, b2, sizeof (b1)) != 0)        \
> +       __builtin_abort ();                             \
> +     }
> +
> + FOR_EACH_TYPE (ADD_TEST)
> +
> + int
> + main (void)
> + {
> +   FOR_EACH_TYPE (DO_TEST)
> +   return 0;
> + }
> +
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* 
> [_a-z][^ ]* is outside \[0, 2\)} "vect" { target vect_int } } } */
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* 
> [_a-z][^ ]* is outside \[0, 3\)} "vect" { target vect_int } } } */
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* 
> [_a-z][^ ]* is outside \[0, 4\)} "vect" { target vect_int } } } */
> + /* { dg-final { scan-tree-dump {run-time check [^\n]* unsigned \([^*]*\) >= 
> 4} "vect" { target vect_int } } } */
> +
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* 
> [_a-z][^ ]* \* 2[)]* is outside \[0, 4\)} "vect" { target vect_int } } } */
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* 
> [_a-z][^ ]* \* 2[)]* is outside \[0, 6\)} "vect" { target vect_int } } } */
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* 
> [_a-z][^ ]* \* 2[)]* is outside \[0, 8\)} "vect" { target vect_int } } } */
> + /* { dg-final { scan-tree-dump {run-time check [^\n]* unsigned \([^*]* \* 
> 2[)]* >= 8} "vect" { target vect_int } } } */
> +
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* 
> [_a-z][^ ]* \* 4[)]* is outside \[0, 8\)} "vect" { target { vect_int || 
> vect_float } }} } */
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* 
> [_a-z][^ ]* \* 4[)]* is outside \[0, 12\)} "vect" { target { vect_int || 
> vect_float } }} } */
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* 
> [_a-z][^ ]* \* 4[)]* is outside \[0, 16\)} "vect" { target { vect_int || 
> vect_float } }} } */
> + /* { dg-final { scan-tree-dump {run-time check [^\n]* unsigned \([^*]* \* 
> 4[)]* >= 16} "vect" { target { vect_int || vect_float } }} } */
> +
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* 
> [_a-z][^ ]* \* 8[)]* is outside \[0, 16\)} "vect" { target vect_double } } } 
> */
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* 
> [_a-z][^ ]* \* 8[)]* is outside \[0, 24\)} "vect" { target vect_double } } } 
> */
> + /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* 
> [_a-z][^ ]* \* 8[)]* is outside \[0, 32\)} "vect" { target vect_double } } } 
> */
> + /* { dg-final { scan-tree-dump {run-time check [^\n]* unsigned \([^*]* \* 
> 8[)]* >= 32} "vect" { target vect_double } } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c      2017-11-17 
> 22:11:53.266001803 +0000
> ***************
> *** 0 ****
> --- 1,62 ----
> + /* { dg-do run } */
> +
> + #define N 200
> + #define DIST 32
> +
> + typedef signed char sc;
> + typedef unsigned char uc;
> + typedef signed short ss;
> + typedef unsigned short us;
> + typedef int si;
> + typedef unsigned int ui;
> + typedef signed long long sll;
> + typedef unsigned long long ull;
> +
> + #define FOR_EACH_TYPE(M) \
> +   M (sc) M (uc) \
> +   M (ss) M (us) \
> +   M (si) M (ui) \
> +   M (sll) M (ull) \
> +   M (float) M (double)
> +
> + #define TEST_VALUE(I) ((I) * 5 / 2)
> +
> + #define ADD_TEST(TYPE)                                \
> +   TYPE a_##TYPE[N * 2];                               \
> +   void __attribute__((noinline, noclone))     \
> +   test_##TYPE (int x, int y)                  \
> +   {                                           \
> +     for (int i = 0; i < N; ++i)                       \
> +       a_##TYPE[i + x] += a_##TYPE[i + y];     \
> +   }
> +
> + #define DO_TEST(TYPE)                                         \
> +   for (int i = 0; i < DIST * 2; ++i)                          \
> +     {                                                         \
> +       for (int j = 0; j < N + DIST * 2; ++j)                  \
> +       a_##TYPE[j] = TEST_VALUE (j);                           \
> +       test_##TYPE (i, DIST);                                  \
> +       for (int j = 0; j < N + DIST * 2; ++j)                  \
> +       {                                                       \
> +         TYPE expected;                                        \
> +         if (j < i || j >= i + N)                              \
> +           expected = TEST_VALUE (j);                          \
> +         else if (i <= DIST)                                   \
> +           expected = ((TYPE) TEST_VALUE (j)                   \
> +                       + (TYPE) TEST_VALUE (j - i + DIST));    \
> +         else                                                  \
> +           expected = ((TYPE) TEST_VALUE (j)                   \
> +                       + a_##TYPE[j - i + DIST]);              \
> +         if (expected != a_##TYPE[j])                          \
> +           __builtin_abort ();                                 \
> +       }                                                       \
> +     }
> +
> + FOR_EACH_TYPE (ADD_TEST)
> +
> + int
> + main (void)
> + {
> +   FOR_EACH_TYPE (DO_TEST)
> +   return 0;
> + }
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c      2017-11-17 
> 22:11:53.266001803 +0000
> ***************
> *** 0 ****
> --- 1,55 ----
> + /* { dg-do run } */
> +
> + #define N 200
> + #define M 4
> +
> + typedef signed char sc;
> + typedef unsigned char uc;
> + typedef signed short ss;
> + typedef unsigned short us;
> + typedef int si;
> + typedef unsigned int ui;
> + typedef signed long long sll;
> + typedef unsigned long long ull;
> +
> + #define FOR_EACH_TYPE(M) \
> +   M (sc) M (uc) \
> +   M (ss) M (us) \
> +   M (si) M (ui) \
> +   M (sll) M (ull) \
> +   M (float) M (double)
> +
> + #define TEST_VALUE(I) ((I) * 5 / 2)
> +
> + #define ADD_TEST(TYPE)                                \
> +   void __attribute__((noinline, noclone))     \
> +   test_##TYPE (TYPE *a, TYPE *b)              \
> +   {                                           \
> +     for (int i = 0; i < N; i += 2)            \
> +       {                                               \
> +       a[i + 0] = b[i + 0] + 2;                \
> +       a[i + 1] = b[i + 1] + 3;                \
> +       }                                               \
> +   }
> +
> + #define DO_TEST(TYPE)                                 \
> +   for (int j = 1; j < M; ++j)                         \
> +     {                                                 \
> +       TYPE a[N + M];                                  \
> +       for (int i = 0; i < N + M; ++i)                 \
> +       a[i] = TEST_VALUE (i);                          \
> +       test_##TYPE (a + j, a);                         \
> +       for (int i = 0; i < N; i += 2)                  \
> +       if (a[i + j] != (TYPE) (a[i] + 2)               \
> +           || a[i + j + 1] != (TYPE) (a[i + 1] + 3))   \
> +         __builtin_abort ();                           \
> +     }
> +
> + FOR_EACH_TYPE (ADD_TEST)
> +
> + int
> + main (void)
> + {
> +   FOR_EACH_TYPE (DO_TEST)
> +   return 0;
> + }
> Index: gcc/testsuite/gcc.target/aarch64/sve_strided_load_8.c
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gcc.target/aarch64/sve_strided_load_8.c       2017-11-17 
> 22:11:53.266001803 +0000
> ***************
> *** 0 ****
> --- 1,15 ----
> + /* { dg-do compile } */
> + /* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
> +
> + void
> + foo (double *x, int m)
> + {
> +   for (int i = 0; i < 256; ++i)
> +     x[i * m] += x[i * m];
> + }
> +
> + /* { dg-final { scan-assembler-times {\tcbz\tw1,} 1 } } */
> + /* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, } 1 } } */
> + /* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, } 1 } } */
> + /* { dg-final { scan-assembler-times {\tldr\t} 1 } } */
> + /* { dg-final { scan-assembler-times {\tstr\t} 1 } } */
> Index: gcc/testsuite/gcc.target/aarch64/sve_var_stride_1.c
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gcc.target/aarch64/sve_var_stride_1.c 2017-11-17 
> 22:11:53.266931558 +0000
> ***************
> *** 0 ****
> --- 1,27 ----
> + /* { dg-do compile } */
> + /* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
> +
> + #define TYPE int
> + #define SIZE 257
> +
> + void __attribute__ ((weak))
> + f (TYPE *x, TYPE *y, unsigned short n, long m __attribute__((unused)))
> + {
> +   for (int i = 0; i < SIZE; ++i)
> +     x[i * n] += y[i * n];
> + }
> +
> + /* { dg-final { scan-assembler {\tld1w\tz[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */
> + /* Should multiply by (VF-1)*4 rather than (257-1)*4.  */
> + /* { dg-final { scan-assembler-not {, 1024} } } */
> + /* { dg-final { scan-assembler-not {\t.bfiz\t} } } */
> + /* { dg-final { scan-assembler-not {lsl[^\n]*[, ]10} } } */
> + /* { dg-final { scan-assembler-not {\tcmp\tx[0-9]+, 0} } } */
> + /* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */
> + /* { dg-final { scan-assembler-not {\tcsel\tx[0-9]+} } } */
> + /* Two range checks and a check for n being zero.  */
> + /* { dg-final { scan-assembler-times {\tcmp\t} 1 } } */
> + /* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */
> Index: gcc/testsuite/gcc.target/aarch64/sve_var_stride_1.h
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gcc.target/aarch64/sve_var_stride_1.h 2017-11-17 
> 22:11:53.266931558 +0000
> ***************
> *** 0 ****
> --- 1,61 ----
> + extern void abort (void) __attribute__ ((noreturn));
> +
> + #define MARGIN 6
> +
> + void __attribute__ ((weak, optimize ("no-tree-vectorize")))
> + test (int n, int m, int offset)
> + {
> +   int abs_n = (n < 0 ? -n : n);
> +   int abs_m = (m < 0 ? -m : m);
> +   int max_i = (abs_n > abs_m ? abs_n : abs_m);
> +   int abs_offset = (offset < 0 ? -offset : offset);
> +   int size = MARGIN * 2 + max_i * SIZE + abs_offset;
> +   TYPE *array = (TYPE *) __builtin_alloca (size * sizeof (TYPE));
> +   for (int i = 0; i < size; ++i)
> +     array[i] = i;
> +   int base_x = offset < 0 ? MARGIN - offset : MARGIN;
> +   int base_y = offset < 0 ? MARGIN : MARGIN + offset;
> +   int start_x = n < 0 ? base_x - n * (SIZE - 1) : base_x;
> +   int start_y = m < 0 ? base_y - m * (SIZE - 1) : base_y;
> +   f (&array[start_x], &array[start_y], n, m);
> +   int j = 0;
> +   int start = (n < 0 ? size - 1 : 0);
> +   int end = (n < 0 ? -1 : size);
> +   int inc = (n < 0 ? -1 : 1);
> +   for (int i = start; i != end; i += inc)
> +     {
> +       if (j == SIZE || i != start_x + j * n)
> +       {
> +         if (array[i] != i)
> +           abort ();
> +       }
> +       else if (n == 0)
> +       {
> +         TYPE sum = i;
> +         for (; j < SIZE; j++)
> +           {
> +             int next_y = start_y + j * m;
> +             if (n >= 0 ? next_y < i : next_y > i)
> +               sum += array[next_y];
> +             else if (next_y == i)
> +               sum += sum;
> +             else
> +               sum += next_y;
> +           }
> +         if (array[i] != sum)
> +           abort ();
> +       }
> +       else
> +       {
> +         int next_y = start_y + j * m;
> +         TYPE base = i;
> +         if (n >= 0 ? next_y < i : next_y > i)
> +           base += array[next_y];
> +         else
> +           base += next_y;
> +         if (array[i] != base)
> +           abort ();
> +         j += 1;
> +       }
> +     }
> + }
> Index: gcc/testsuite/gcc.target/aarch64/sve_var_stride_1_run.c
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gcc.target/aarch64/sve_var_stride_1_run.c     2017-11-17 
> 22:11:53.266931558 +0000
> ***************
> *** 0 ****
> --- 1,14 ----
> + /* { dg-do run { target { aarch64_sve_hw } } } */
> + /* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
> +
> + #include "sve_var_stride_1.c"
> + #include "sve_var_stride_1.h"
> +
> + int
> + main (void)
> + {
> +   for (int n = 0; n < 10; ++n)
> +     for (int offset = -33; offset <= 33; ++offset)
> +       test (n, n, offset);
> +   return 0;
> + }
> Index: gcc/testsuite/gcc.target/aarch64/sve_var_stride_2.c
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gcc.target/aarch64/sve_var_stride_2.c 2017-11-17 
> 22:11:53.266931558 +0000
> ***************
> *** 0 ****
> --- 1,25 ----
> + /* { dg-do compile } */
> + /* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
> +
> + #define TYPE int
> + #define SIZE 257
> +
> + void __attribute__ ((weak))
> + f (TYPE *x, TYPE *y, unsigned short n, unsigned short m)
> + {
> +   for (int i = 0; i < SIZE; ++i)
> +     x[i * n] += y[i * m];
> + }
> +
> + /* { dg-final { scan-assembler {\tld1w\tz[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */
> + /* Should multiply by (257-1)*4 rather than (VF-1)*4.  */
> + /* { dg-final { scan-assembler-times {\tubfiz\tx[0-9]+, x[0-9]+, 10, 16} 2 
> } } */
> + /* { dg-final { scan-assembler-not {\tcmp\tx[0-9]+, 0} } } */
> + /* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */
> + /* { dg-final { scan-assembler-not {\tcsel\tx[0-9]+} } } */
> + /* Two range checks and a check for n being zero.  (m being zero is OK.)  */
> + /* { dg-final { scan-assembler-times {\tcmp\t} 1 } } */
> + /* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */
> Index: gcc/testsuite/gcc.target/aarch64/sve_var_stride_2_run.c
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gcc.target/aarch64/sve_var_stride_2_run.c     2017-11-17 
> 22:11:53.266931558 +0000
> ***************
> *** 0 ****
> --- 1,18 ----
> + /* { dg-do run { target { aarch64_sve_hw } } } */
> + /* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
> +
> + #include "sve_var_stride_2.c"
> + #include "sve_var_stride_1.h"
> +
> + int
> + main (void)
> + {
> +   for (int n = 0; n < 10; ++n)
> +     for (int m = 0; m < 10; ++m)
> +       for (int offset = -17; offset <= 17; ++offset)
> +       {
> +         test (n, m, offset);
> +         test (n, m, offset + n * (SIZE - 1));
> +       }
> +   return 0;
> + }
> Index: gcc/testsuite/gcc.target/aarch64/sve_var_stride_3.c
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gcc.target/aarch64/sve_var_stride_3.c 2017-11-17 
> 22:11:53.266931558 +0000
> ***************
> *** 0 ****
> --- 1,27 ----
> + /* { dg-do compile } */
> + /* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
> +
> + #define TYPE int
> + #define SIZE 257
> +
> + void __attribute__ ((weak))
> + f (TYPE *x, TYPE *y, int n, long m __attribute__((unused)))
> + {
> +   for (int i = 0; i < SIZE; ++i)
> +     x[i * n] += y[i * n];
> + }
> +
> + /* { dg-final { scan-assembler {\tld1w\tz[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */
> + /* Should multiply by (VF-1)*4 rather than (257-1)*4.  */
> + /* { dg-final { scan-assembler-not {, 1024} } } */
> + /* { dg-final { scan-assembler-not {\t.bfiz\t} } } */
> + /* { dg-final { scan-assembler-not {lsl[^\n]*[, ]10} } } */
> + /* { dg-final { scan-assembler-not {\tcmp\tx[0-9]+, 0} } } */
> + /* { dg-final { scan-assembler {\tcmp\tw2, 0} } } */
> + /* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 2 } } */
> + /* Two range checks and a check for n being zero.  */
> + /* { dg-final { scan-assembler {\tcmp\t} } } */
> + /* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */
> Index: gcc/testsuite/gcc.target/aarch64/sve_var_stride_3_run.c
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gcc.target/aarch64/sve_var_stride_3_run.c     2017-11-17 
> 22:11:53.266931558 +0000
> ***************
> *** 0 ****
> --- 1,14 ----
> + /* { dg-do run { target { aarch64_sve_hw } } } */
> + /* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
> +
> + #include "sve_var_stride_3.c"
> + #include "sve_var_stride_1.h"
> +
> + int
> + main (void)
> + {
> +   for (int n = -10; n < 10; ++n)
> +     for (int offset = -33; offset <= 33; ++offset)
> +       test (n, n, offset);
> +   return 0;
> + }
> Index: gcc/testsuite/gcc.target/aarch64/sve_var_stride_4.c
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gcc.target/aarch64/sve_var_stride_4.c 2017-11-17 
> 22:11:53.266931558 +0000
> ***************
> *** 0 ****
> --- 1,25 ----
> + /* { dg-do compile } */
> + /* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
> +
> + #define TYPE int
> + #define SIZE 257
> +
> + void __attribute__ ((weak))
> + f (TYPE *x, TYPE *y, int n, int m)
> + {
> +   for (int i = 0; i < SIZE; ++i)
> +     x[i * n] += y[i * m];
> + }
> +
> + /* { dg-final { scan-assembler {\tld1w\tz[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */
> + /* Should multiply by (257-1)*4 rather than (VF-1)*4.  */
> + /* { dg-final { scan-assembler-times {\tsbfiz\tx[0-9]+, x[0-9]+, 10, 32} 2 
> } } */
> + /* { dg-final { scan-assembler {\tcmp\tw2, 0} } } */
> + /* { dg-final { scan-assembler {\tcmp\tw3, 0} } } */
> + /* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 4 } } */
> + /* Two range checks and a check for n being zero.  (m being zero is OK.)  */
> + /* { dg-final { scan-assembler {\tcmp\t} } } */
> + /* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */
> Index: gcc/testsuite/gcc.target/aarch64/sve_var_stride_4_run.c
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gcc.target/aarch64/sve_var_stride_4_run.c     2017-11-17 
> 22:11:53.267861313 +0000
> ***************
> *** 0 ****
> --- 1,18 ----
> + /* { dg-do run { target { aarch64_sve_hw } } } */
> + /* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
> +
> + #include "sve_var_stride_4.c"
> + #include "sve_var_stride_1.h"
> +
> + int
> + main (void)
> + {
> +   for (int n = -10; n < 10; ++n)
> +     for (int m = -10; m < 10; ++m)
> +       for (int offset = -17; offset <= 17; ++offset)
> +       {
> +         test (n, m, offset);
> +         test (n, m, offset + n * (SIZE - 1));
> +       }
> +   return 0;
> + }
> Index: gcc/testsuite/gcc.target/aarch64/sve_var_stride_5.c
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gcc.target/aarch64/sve_var_stride_5.c 2017-11-17 
> 22:11:53.267861313 +0000
> ***************
> *** 0 ****
> --- 1,27 ----
> + /* { dg-do compile } */
> + /* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
> +
> + #define TYPE double
> + #define SIZE 257
> +
> + void __attribute__ ((weak))
> + f (TYPE *x, TYPE *y, long n, long m __attribute__((unused)))
> + {
> +   for (int i = 0; i < SIZE; ++i)
> +     x[i * n] += y[i * n];
> + }
> +
> + /* { dg-final { scan-assembler {\tld1d\tz[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tst1d\tz[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tldr\td[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tstr\td[0-9]+} } } */
> + /* Should multiply by (VF-1)*8 rather than (257-1)*8.  */
> + /* { dg-final { scan-assembler-not {, 2048} } } */
> + /* { dg-final { scan-assembler-not {\t.bfiz\t} } } */
> + /* { dg-final { scan-assembler-not {lsl[^\n]*[, ]11} } } */
> + /* { dg-final { scan-assembler {\tcmp\tx[0-9]+, 0} } } */
> + /* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */
> + /* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 2 } } */
> + /* Two range checks and a check for n being zero.  */
> + /* { dg-final { scan-assembler {\tcmp\t} } } */
> + /* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */
> Index: gcc/testsuite/gcc.target/aarch64/sve_var_stride_5_run.c
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gcc.target/aarch64/sve_var_stride_5_run.c     2017-11-17 
> 22:11:53.267861313 +0000
> ***************
> *** 0 ****
> --- 1,14 ----
> + /* { dg-do run { target { aarch64_sve_hw } } } */
> + /* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
> +
> + #include "sve_var_stride_5.c"
> + #include "sve_var_stride_1.h"
> +
> + int
> + main (void)
> + {
> +   for (int n = -10; n < 10; ++n)
> +     for (int offset = -33; offset <= 33; ++offset)
> +       test (n, n, offset);
> +   return 0;
> + }
> Index: gcc/testsuite/gcc.target/aarch64/sve_var_stride_6.c
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gcc.target/aarch64/sve_var_stride_6.c 2017-11-17 
> 22:11:53.267861313 +0000
> ***************
> *** 0 ****
> --- 1,25 ----
> + /* { dg-do compile } */
> + /* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
> +
> + #define TYPE long
> + #define SIZE 257
> +
> + void __attribute__ ((weak))
> + f (TYPE *x, TYPE *y, long n, long m)
> + {
> +   for (int i = 0; i < SIZE; ++i)
> +     x[i * n] += y[i * m];
> + }
> +
> + /* { dg-final { scan-assembler {\tld1d\tz[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tst1d\tz[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tldr\tx[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tstr\tx[0-9]+} } } */
> + /* Should multiply by (257-1)*8 rather than (VF-1)*8.  */
> + /* { dg-final { scan-assembler-times {lsl\tx[0-9]+, x[0-9]+, 11} 2 } } */
> + /* { dg-final { scan-assembler {\tcmp\tx[0-9]+, 0} } } */
> + /* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */
> + /* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 4 } } */
> + /* Two range checks and a check for n being zero.  (m being zero is OK.)  */
> + /* { dg-final { scan-assembler {\tcmp\t} } } */
> + /* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */
> Index: gcc/testsuite/gcc.target/aarch64/sve_var_stride_6_run.c
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gcc.target/aarch64/sve_var_stride_6_run.c     2017-11-17 
> 22:11:53.267861313 +0000
> ***************
> *** 0 ****
> --- 1,18 ----
> + /* { dg-do run { target { aarch64_sve_hw } } } */
> + /* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
> +
> + #include "sve_var_stride_6.c"
> + #include "sve_var_stride_1.h"
> +
> + int
> + main (void)
> + {
> +   for (int n = -10; n < 10; ++n)
> +     for (int m = -10; m < 10; ++m)
> +       for (int offset = -17; offset <= 17; ++offset)
> +       {
> +         test (n, m, offset);
> +         test (n, m, offset + n * (SIZE - 1));
> +       }
> +   return 0;
> + }
> Index: gcc/testsuite/gcc.target/aarch64/sve_var_stride_7.c
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gcc.target/aarch64/sve_var_stride_7.c 2017-11-17 
> 22:11:53.267861313 +0000
> ***************
> *** 0 ****
> --- 1,26 ----
> + /* { dg-do compile } */
> + /* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
> +
> + #define TYPE double
> + #define SIZE 257
> +
> + void __attribute__ ((weak))
> + f (TYPE *x, TYPE *y, long n, long m __attribute__((unused)))
> + {
> +   for (int i = 0; i < SIZE; ++i)
> +     x[i * n] += y[i];
> + }
> +
> + /* { dg-final { scan-assembler {\tld1d\tz[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tst1d\tz[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tldr\td[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tstr\td[0-9]+} } } */
> + /* Should multiply by (257-1)*8 rather than (VF-1)*8.  */
> + /* { dg-final { scan-assembler-times {\tadd\tx[0-9]+, x1, 2048} 1 } } */
> + /* { dg-final { scan-assembler-times {lsl\tx[0-9]+, x[0-9]+, 11} 1 } } */
> + /* { dg-final { scan-assembler {\tcmp\tx[0-9]+, 0} } } */
> + /* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */
> + /* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 2 } } */
> + /* Two range checks and a check for n being zero.  */
> + /* { dg-final { scan-assembler {\tcmp\t} } } */
> + /* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */
> Index: gcc/testsuite/gcc.target/aarch64/sve_var_stride_7_run.c
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gcc.target/aarch64/sve_var_stride_7_run.c     2017-11-17 
> 22:11:53.267861313 +0000
> ***************
> *** 0 ****
> --- 1,14 ----
> + /* { dg-do run { target { aarch64_sve_hw } } } */
> + /* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
> +
> + #include "sve_var_stride_7.c"
> + #include "sve_var_stride_1.h"
> +
> + int
> + main (void)
> + {
> +   for (int n = -10; n < 10; ++n)
> +     for (int offset = -33; offset <= 33; ++offset)
> +       test (n, 1, offset);
> +   return 0;
> + }
> Index: gcc/testsuite/gcc.target/aarch64/sve_var_stride_8.c
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gcc.target/aarch64/sve_var_stride_8.c 2017-11-17 
> 22:11:53.267861313 +0000
> ***************
> *** 0 ****
> --- 1,26 ----
> + /* { dg-do compile } */
> + /* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
> +
> + #define TYPE long
> + #define SIZE 257
> +
> + void
> + f (TYPE *x, TYPE *y, long n __attribute__((unused)), long m)
> + {
> +   for (int i = 0; i < SIZE; ++i)
> +     x[i] += y[i * m];
> + }
> +
> + /* { dg-final { scan-assembler {\tld1d\tz[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tst1d\tz[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tldr\tx[0-9]+} } } */
> + /* { dg-final { scan-assembler {\tstr\tx[0-9]+} } } */
> + /* Should multiply by (257-1)*8 rather than (VF-1)*8.  */
> + /* { dg-final { scan-assembler-times {\tadd\tx[0-9]+, x0, 2048} 1 } } */
> + /* { dg-final { scan-assembler-times {lsl\tx[0-9]+, x[0-9]+, 11} 1 } } */
> + /* { dg-final { scan-assembler {\tcmp\tx[0-9]+, 0} } } */
> + /* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */
> + /* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 2 } } */
> + /* Two range checks only; doesn't matter whether n is zero.  */
> + /* { dg-final { scan-assembler {\tcmp\t} } } */
> + /* { dg-final { scan-assembler-times {\tccmp\t} 1 } } */
> Index: gcc/testsuite/gcc.target/aarch64/sve_var_stride_8_run.c
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gcc.target/aarch64/sve_var_stride_8_run.c     2017-11-17 
> 22:11:53.269720823 +0000
> ***************
> *** 0 ****
> --- 1,14 ----
> + /* { dg-do run { target { aarch64_sve_hw } } } */
> + /* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
> +
> + #include "sve_var_stride_8.c"
> + #include "sve_var_stride_1.h"
> +
> + int
> + main (void)
> + {
> +   for (int n = -10; n < 10; ++n)
> +     for (int offset = -33; offset <= 33; ++offset)
> +       test (1, n, offset);
> +   return 0;
> + }
> Index: gcc/testsuite/gfortran.dg/vect/vect-alias-check-1.F90
> ===================================================================
> *** /dev/null   2017-11-14 14:28:07.424493901 +0000
> --- gcc/testsuite/gfortran.dg/vect/vect-alias-check-1.F90       2017-11-17 
> 22:11:53.269720823 +0000
> ***************
> *** 0 ****
> --- 1,102 ----
> + ! { dg-do run }
> + ! { dg-additional-options "-fno-inline" }
> +
> + #define N 200
> +
> + #define TEST_VALUE(I) ((I) * 5 / 2)
> +
> + subroutine setup(a)
> +   real :: a(N)
> +   do i = 1, N
> +      a(i) = TEST_VALUE(i)
> +   end do
> + end subroutine
> +
> + subroutine check(a, x, gap)
> +   real :: a(N), temp, x
> +   integer :: gap
> +   do i = 1, N - gap
> +      temp = a(i + gap) + x
> +      if (a(i) /= temp) call abort
> +   end do
> +   do i = N - gap + 1, N
> +      temp = TEST_VALUE(i)
> +      if (a(i) /= temp) call abort
> +   end do
> + end subroutine
> +
> + subroutine testa(a, x, base, n)
> +   real :: a(n), x
> +   integer :: base, n
> +   do i = n, 2, -1
> +      a(base + i - 1) = a(base + i) + x
> +   end do
> + end subroutine testa
> +
> + subroutine testb(a, x, base, n)
> +   real :: a(n), x
> +   integer :: base
> +   do i = n, 4, -1
> +      a(base + i - 3) = a(base + i) + x
> +   end do
> + end subroutine testb
> +
> + subroutine testc(a, x, base, n)
> +   real :: a(n), x
> +   integer :: base
> +   do i = n, 8, -1
> +      a(base + i - 7) = a(base + i) + x
> +   end do
> + end subroutine testc
> +
> + subroutine testd(a, x, base, n)
> +   real :: a(n), x
> +   integer :: base
> +   do i = n, 16, -1
> +      a(base + i - 15) = a(base + i) + x
> +   end do
> + end subroutine testd
> +
> + subroutine teste(a, x, base, n)
> +   real :: a(n), x
> +   integer :: base
> +   do i = n, 32, -1
> +      a(base + i - 31) = a(base + i) + x
> +   end do
> + end subroutine teste
> +
> + subroutine testf(a, x, base, n)
> +   real :: a(n), x
> +   integer :: base
> +   do i = n, 64, -1
> +      a(base + i - 63) = a(base + i) + x
> +   end do
> + end subroutine testf
> +
> + program main
> +   real :: a(N)
> +
> +   call setup(a)
> +   call testa(a, 91.0, 0, N)
> +   call check(a, 91.0, 1)
> +
> +   call setup(a)
> +   call testb(a, 55.0, 0, N)
> +   call check(a, 55.0, 3)
> +
> +   call setup(a)
> +   call testc(a, 72.0, 0, N)
> +   call check(a, 72.0, 7)
> +
> +   call setup(a)
> +   call testd(a, 69.0, 0, N)
> +   call check(a, 69.0, 15)
> +
> +   call setup(a)
> +   call teste(a, 44.0, 0, N)
> +   call check(a, 44.0, 31)
> +
> +   call setup(a)
> +   call testf(a, 39.0, 0, N)
> +   call check(a, 39.0, 63)
> + end program

Reply via email to