On Wed, 26 Feb 2025, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguent...@suse.de>
> > Sent: Wednesday, February 26, 2025 1:52 PM
> > To: Tamar Christina <tamar.christ...@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>
> > Subject: RE: [3/3 PATCH v3]middle-end: delay checking for alignment to load
> > [PR118464]
> > 
> > On Wed, 26 Feb 2025, Tamar Christina wrote:
> > 
> > > > -----Original Message-----
> > > > From: Richard Biener <rguent...@suse.de>
> > > > Sent: Wednesday, February 26, 2025 12:30 PM
> > > > To: Tamar Christina <tamar.christ...@arm.com>
> > > > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>
> > > > Subject: Re: [3/3 PATCH v3]middle-end: delay checking for alignment to 
> > > > load
> > > > [PR118464]
> > > >
> > > > On Tue, 25 Feb 2025, Tamar Christina wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > This fixes two PRs on Early break vectorization by delaying the 
> > > > > safety checks
> > to
> > > > > vectorizable_load when the VF, VMAT and vectype are all known.
> > > > >
> > > > > This patch does add two new restrictions:
> > > > >
> > > > > 1. On LOAD_LANES targets, where the buffer size is known, we reject 
> > > > > uneven
> > > > >    group sizes, as they are unaligned every n % 2 iterations and so 
> > > > > may cross
> > > > >    a page unwittingly.
> > > > >
> > > > > 2. On LOAD_LANES targets when the buffer is unknown, we reject
> > vectorization
> > > > if
> > > > >    we cannot peel for alignment, as the alignment requirement is 
> > > > > quite large at
> > > > >    GROUP_SIZE * vectype_size.  This is unlikely to ever be beneficial 
> > > > > so we
> > > > >    don't support it for now.
> > > > >
> > > > > There are other steps documented inside the code itself so that the 
> > > > > reasoning
> > > > > is next to the code.
> > > > >
> > > > > Note that for VLA I have still left this fully disabled when not 
> > > > > working on a
> > > > > fixed buffer.
> > > > >
> > > > > For VLA targets like SVE return element alignment as the desired 
> > > > > vector
> > > > > alignment.  This means that the loads are never misaligned and so 
> > > > > annoying it
> > > > > won't ever need to peel.
> > > > >
> > > > > So what I think needs to happen in GCC 16 is that.
> > > > >
> > > > > 1. during vect_compute_data_ref_alignment we need to take the max of
> > > > >    POLY_VALUE_MIN and vector_alignment.
> > > > >
> > > > > 2. vect_do_peeling define skip_vector when PFA for VLA, and in the 
> > > > > guard
> > add a
> > > > >    check that ncopies * vectype does not exceed POLY_VALUE_MAX which 
> > > > > we
> > use
> > > > as a
> > > > >    proxy for pagesize.
> > > > >
> > > > > 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in
> > > > >    vect_determine_partial_vectors_and_peeling since the first 
> > > > > iteration has to
> > > > >    be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail
> > to
> > > > >    vectorize.
> > > > >
> > > > > 4. Create a default mask to be used, so that
> > > > vect_use_loop_mask_for_alignment_p
> > > > >    becomes true and we generate the peeled check through loop control 
> > > > > for
> > > > >    partial loops.  From what I can tell this won't work for
> > > > >    LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling
> > > > support at
> > > > >    all in the compiler.  That would need to be done independently 
> > > > > from the
> > > > >    above.
> > > > >
> > > > > In any case, not GCC 15 material so I've kept the WIP patches I have
> > > > downstream.
> > > > >
> > > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > > > > -m32, -m64 and no issues.
> > > > >
> > > > > Ok for master?
> > > > >
> > > > > Thanks,
> > > > > Tamar
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >       PR tree-optimization/118464
> > > > >       PR tree-optimization/116855
> > > > >       * doc/invoke.texi (min-pagesize): Update docs with vectorizer 
> > > > > use.
> > > > >       * tree-vect-data-refs.cc 
> > > > > (vect_analyze_early_break_dependences): Delay
> > > > >       checks.
> > > > >       (vect_compute_data_ref_alignment): Remove alignment checks and 
> > > > > move
> > > > to
> > > > >       get_load_store_type, increase group access alignment.
> > > > >       (vect_enhance_data_refs_alignment): Add note to comment needing
> > > > >       investigating.
> > > > >       (vect_analyze_data_refs_alignment): Likewise.
> > > > >       (vect_supportable_dr_alignment): For group loads look at first 
> > > > > DR.
> > > > >       * tree-vect-stmts.cc (get_load_store_type):
> > > > >       Perform safety checks for early break pfa.
> > > > >       * tree-vectorizer.h (dr_set_safe_speculative_read_required,
> > > > >       dr_safe_speculative_read_required, DR_SCALAR_KNOWN_BOUNDS):
> > > > New.
> > > > >       (need_peeling_for_alignment): Renamed to...
> > > > >       (safe_speculative_read_required): .. This
> > > > >       (class dr_vec_info): Add scalar_access_known_in_bounds.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > >       PR tree-optimization/118464
> > > > >       PR tree-optimization/116855
> > > > >       * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes 
> > > > > because the
> > > > >       load type is relaxed later.
> > > > >       * gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
> > > > >       * gcc.dg/vect/vect-early-break_22.c: Reject for load_lanes 
> > > > > targets
> > > > >       * g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
> > > > >       * gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
> > > > >       * gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
> > > > >       * gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
> > > > >       * gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
> > > > >       * gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
> > > > >       * gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
> > > > >       * gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
> > > > >       * gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
> > > > >       * gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
> > > > >       * gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
> > > > >       * gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
> > > > >       * gcc.dg/vect/vect-early-break_39.c: Update testcase for 
> > > > > misalignment.
> > > > >       * gcc.dg/vect/vect-early-break_53.c: Likewise.
> > > > >       * gcc.dg/vect/vect-early-break_56.c: Likewise.
> > > > >       * gcc.dg/vect/vect-early-break_57.c: Likewise.
> > > > >       * gcc.dg/vect/vect-early-break_81.c: Likewise.
> > > > >
> > > > > ---
> > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > > > index
> > > >
> > ca8e468f3f2dbf68c959f74d8fec48c79463504d..fff2874b326d605fc2656adf4ab
> > > > 6eb5bd5d42d71 100644
> > > > > --- a/gcc/doc/invoke.texi
> > > > > +++ b/gcc/doc/invoke.texi
> > > > > @@ -17256,7 +17256,7 @@ Maximum number of relations the oracle will
> > > > register in a basic block.
> > > > >  Work bound when discovering transitive relations from existing 
> > > > > relations.
> > > > >
> > > > >  @item min-pagesize
> > > > > -Minimum page size for warning purposes.
> > > > > +Minimum page size for warning and early break vectorization purposes.
> > > > >
> > > > >  @item openacc-kernels
> > > > >  Specify mode of OpenACC `kernels' constructs handling.
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > > > b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > > > > index
> > > >
> > 9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d
> > > > 93c950629f3231554 100644
> > > > > --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > > > > @@ -55,7 +55,9 @@ int main()
> > > > >         }
> > > > >      }
> > > > >    rephase ();
> > > > > +#pragma GCC novector
> > > > >    for (i = 0; i < 32; ++i)
> > > > > +#pragma GCC novector
> > > > >      for (j = 0; j < 3; ++j)
> > > > >  #pragma GCC novector
> > > > >        for (k = 0; k < 3; ++k)
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > > > > index
> > > >
> > 423ff0b566b18bf04ce4f67a45b94dc1a021a4a0..f99c57be0adc4d49035b8a75c
> > > > 72d4a5b04cc05c7 100644
> > > > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > > > > @@ -5,7 +5,8 @@
> > > > >  /* { dg-additional-options "-O3" } */
> > > > >  /* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } 
> > > > > } } */
> > > > >
> > > > > -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > +/* Arm and -m32 create a group size of 3 here, which we can't 
> > > > > support yet.
> > > > AArch64 makes elementwise accesses here.  */
> > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target {
> > aarch64*-*-
> > > > * } } } } */
> > > > >
> > > > >  typedef struct filter_list_entry {
> > > > >    const char *name;
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..9bf0cbc8853f74de550e8a
> > > > c83ab569fc9fbde126
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> > > > > @@ -0,0 +1,25 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-add-options vect_early_break } */
> > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > +/* { dg-require-effective-target vect_int } */
> > > > > +/* { dg-additional-options "-O3" } */
> > > > > +
> > > > > +int a, b, c, d, e, f;
> > > > > +short g[1];
> > > > > +int main() {
> > > > > +  int h;
> > > > > +  while (a) {
> > > > > +    while (h)
> > > > > +      ;
> > > > > +    for (b = 2; b; b--) {
> > > > > +      while (c)
> > > > > +        ;
> > > > > +      f = g[a];
> > > > > +      if (d)
> > > > > +        break;
> > > > > +    }
> > > > > +    while (e)
> > > > > +      ;
> > > > > +  }
> > > > > +  return 0;
> > > > > +}
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..dc771186efafe25bb65490
> > > > da7a383ad7f6ceb0a7
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> > > > > @@ -0,0 +1,19 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-add-options vect_early_break } */
> > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > +/* { dg-require-effective-target vect_int } */
> > > > > +/* { dg-additional-options "-O3" } */
> > > > > +
> > > > > +char string[1020];
> > > > > +
> > > > > +char * find(int n, char c)
> > > > > +{
> > > > > +    for (int i = 1; i < n; i++) {
> > > > > +        if (string[i] == c)
> > > > > +            return &string[i];
> > > > > +    }
> > > > > +    return 0;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > +/* { dg-final { scan-tree-dump "Alignment of access forced using 
> > > > > peeling"
> > "vect"
> > > > } } */
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..82d473a279ce060c55028
> > > > 9c61729d9f9b56f0d2a
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> > > > > @@ -0,0 +1,24 @@
> > > > > +/* { dg-add-options vect_early_break } */
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > +/* { dg-require-effective-target vect_int } */
> > > > > +
> > > > > +/* { dg-additional-options "-Ofast" } */
> > > > > +
> > > > > +/* Alignment requirement too big, load lanes targets can't safely 
> > > > > vectorize
> > this.
> > > > */
> > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { !
> > > > vect_load_lanes } } } } */
> > > > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using
> > peeling"
> > > > "vect" { target { ! vect_load_lanes } } } } */
> > > > > +
> > > > > +unsigned test4(char x, char *restrict vect_a, char *restrict vect_b, 
> > > > > int n)
> > > > > +{
> > > > > + unsigned ret = 0;
> > > > > + for (int i = 0; i < (n - 2); i+=2)
> > > > > + {
> > > > > +   if (vect_a[i] > x || vect_a[i+2] > x)
> > > > > +     return 1;
> > > > > +
> > > > > +   vect_b[i] = x;
> > > > > +   vect_b[i+1] = x+1;
> > > > > + }
> > > > > + return ret;
> > > > > +}
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..7d56772fbf380ce42ac758
> > > > ca29a5f3f9d3f6e0d1
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> > > > > @@ -0,0 +1,19 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-add-options vect_early_break } */
> > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > +/* { dg-require-effective-target vect_int } */
> > > > > +/* { dg-additional-options "-O3" } */
> > > > > +
> > > > > +char string[1020];
> > > > > +
> > > > > +char * find(int n, char c)
> > > > > +{
> > > > > +    for (int i = 0; i < n; i++) {
> > > > > +        if (string[i] == c)
> > > > > +            return &string[i];
> > > > > +    }
> > > > > +    return 0;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using
> > peeling"
> > > > "vect" } } */
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..374a051b945e97eedb9be
> > > > 9da423cf54b5e564d6f
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> > > > > @@ -0,0 +1,20 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-add-options vect_early_break } */
> > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > +/* { dg-require-effective-target vect_int } */
> > > > > +/* { dg-additional-options "-O3" } */
> > > > > +
> > > > > +char string[1020] __attribute__((aligned(1)));
> > > > > +
> > > > > +char * find(int n, char c)
> > > > > +{
> > > > > +    for (int i = 1; i < n; i++) {
> > > > > +        if (string[i] == c)
> > > > > +            return &string[i];
> > > > > +    }
> > > > > +    return 0;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > +/* { dg-final { scan-tree-dump "Alignment of access forced using 
> > > > > peeling"
> > "vect"
> > > > } } */
> > > > > +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } 
> > > > > } */
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..297fb7e9b9beffa25ab8f25
> > > > 7ceea1c065fcc6ae9
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> > > > > @@ -0,0 +1,20 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-add-options vect_early_break } */
> > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > +/* { dg-require-effective-target vect_int } */
> > > > > +/* { dg-additional-options "-O3" } */
> > > > > +
> > > > > +char string[1020] __attribute__((aligned(1)));
> > > > > +
> > > > > +char * find(int n, char c)
> > > > > +{
> > > > > +    for (int i = 0; i < n; i++) {
> > > > > +        if (string[i] == c)
> > > > > +            return &string[i];
> > > > > +    }
> > > > > +    return 0;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using
> > peeling"
> > > > "vect" } } */
> > > > > +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } 
> > > > > } */
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..ca95be44e92e32769da1d
> > > > 1e9b740ae54682a3d55
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> > > > > @@ -0,0 +1,23 @@
> > > > > +/* { dg-add-options vect_early_break } */
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > +/* { dg-require-effective-target vect_int } */
> > > > > +
> > > > > +/* { dg-additional-options "-Ofast" } */
> > > > > +
> > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > +
> > > > > +unsigned test4(char x, char *vect, int n)
> > > > > +{
> > > > > + unsigned ret = 0;
> > > > > + for (int i = 0; i < n; i++)
> > > > > + {
> > > > > +   if (vect[i] > x)
> > > > > +     return 1;
> > > > > +
> > > > > +   vect[i] = x;
> > > > > + }
> > > > > + return ret;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-tree-dump "Alignment of access forced using 
> > > > > peeling"
> > "vect"
> > > > } } */
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..ee123df6ed2ba97e92307c
> > > > 64a61c97b1b6268743
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> > > > > @@ -0,0 +1,23 @@
> > > > > +/* { dg-add-options vect_early_break } */
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > +/* { dg-require-effective-target vect_int } */
> > > > > +
> > > > > +/* { dg-additional-options "-Ofast" } */
> > > > > +
> > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > +
> > > > > +unsigned test4(char x, char *vect_a, char *vect_b, int n)
> > > > > +{
> > > > > + unsigned ret = 0;
> > > > > + for (int i = 1; i < n; i++)
> > > > > + {
> > > > > +   if (vect_a[i] > x || vect_b[i] > x)
> > > > > +     return 1;
> > > > > +
> > > > > +   vect_a[i] = x;
> > > > > + }
> > > > > + return ret;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-tree-dump "Versioning for alignment will be 
> > > > > applied"
> > "vect" }
> > > > } */
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..51bad4e745b67cfdaad20f
> > > > 50776299531824ce9c
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> > > > > @@ -0,0 +1,23 @@
> > > > > +/* { dg-add-options vect_early_break } */
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > +/* { dg-require-effective-target vect_int } */
> > > > > +
> > > > > +/* { dg-additional-options "-Ofast" } */
> > > > > +
> > > > > +/* This should be vectorizable through load_lanes and linear 
> > > > > targets.  */
> > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > +
> > > > > +unsigned test4(char x, char * restrict vect_a, char * restrict 
> > > > > vect_b, int n)
> > > > > +{
> > > > > + unsigned ret = 0;
> > > > > + for (int i = 0; i < n; i+=2)
> > > > > + {
> > > > > +   if (vect_a[i] > x || vect_a[i+1] > x)
> > > > > +     return 1;
> > > > > +
> > > > > +   vect_b[i] = x;
> > > > > +   vect_b[i+1] = x+1;
> > > > > + }
> > > > > + return ret;
> > > > > +}
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..dbb14ba3239c91b9bfdf56
> > > > cecc60750394e10f2b
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> > > > > @@ -0,0 +1,25 @@
> > > > > +/* { dg-add-options vect_early_break } */
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > +/* { dg-require-effective-target vect_int } */
> > > > > +
> > > > > +/* { dg-additional-options "-Ofast" } */
> > > > > +
> > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > +
> > > > > +char vect_a[1025];
> > > > > +char vect_b[1025];
> > > > > +
> > > > > +unsigned test4(char x, int n)
> > > > > +{
> > > > > + unsigned ret = 0;
> > > > > + for (int i = 1; i < (n - 2); i+=2)
> > > > > + {
> > > > > +   if (vect_a[i] > x || vect_a[i+1] > x)
> > > > > +     return 1;
> > > > > +
> > > > > +   vect_b[i] = x;
> > > > > +   vect_b[i+1] = x+1;
> > > > > + }
> > > > > + return ret;
> > > > > +}
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..31e209620925353948325
> > > > 3efc17499a53d112894
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> > > > > @@ -0,0 +1,28 @@
> > > > > +/* { dg-add-options vect_early_break } */
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_early_break } */
> > > > > +/* { dg-require-effective-target vect_int } */
> > > > > +
> > > > > +/* { dg-additional-options "-Ofast" } */
> > > > > +
> > > > > +/* Group size is uneven, load lanes targets can't safely vectorize 
> > > > > this.  */
> > > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using
> > peeling"
> > > > "vect" } } */
> > > > > +
> > > > > +
> > > > > +char vect_a[1025];
> > > > > +char vect_b[1025];
> > > > > +
> > > > > +unsigned test4(char x, int n)
> > > > > +{
> > > > > + unsigned ret = 0;
> > > > > + for (int i = 1; i < (n - 2); i+=2)
> > > > > + {
> > > > > +   if (vect_a[i-1] > x || vect_a[i+2] > x)
> > > > > +     return 1;
> > > > > +
> > > > > +   vect_b[i] = x;
> > > > > +   vect_b[i+1] = x+1;
> > > > > + }
> > > > > + return ret;
> > > > > +}
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> > > > > index
> > > >
> > 9d3c6a5dffe3be4a7759b150e330d18144ab5ce5..b3f40b8c9ba49e41bd283e46
> > > > a462238c3b5825ef 100644
> > > > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> > > > > @@ -23,4 +23,5 @@ unsigned test4(unsigned x, unsigned n)
> > > > >   return ret;
> > > > >  }
> > > > >
> > > > > -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" 
> > > > > "vect" } } */
> > > > > +/* cannot safely vectorize this due due to the group misalignment.  
> > > > > */
> > > > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in 
> > > > > function" 0 "vect"
> > } }
> > > > */
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
> > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
> > > > > index
> > > >
> > a02d5986ba3cfc117b19305c5e96711299996931..d4fd0d39a25a5659e3d9452
> > > > b79f3e0fabba8b3c0 100644
> > > > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
> > > > > @@ -2,6 +2,7 @@
> > > > >  /* { dg-do compile } */
> > > > >  /* { dg-require-effective-target vect_early_break } */
> > > > >  /* { dg-require-effective-target vect_int } */
> > > > > +/* { dg-require-effective-target vect_partial_vectors } */
> > > > >
> > > > >  void abort ();
> > > > >  int a[64], b[64];
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
> > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
> > > > > index
> > > >
> > 9096f66647c7b3cb430562d35f8ce076244f7c11..b35e737fa3b9137cd745c14f
> > > > 7ad915a3f81c38c4 100644
> > > > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
> > > > > @@ -4,6 +4,7 @@
> > > > >  /* { dg-require-effective-target vect_int } */
> > > > >  /* { dg-add-options bind_pic_locally } */
> > > > >  /* { dg-require-effective-target vect_early_break_hw } */
> > > > > +/* { dg-require-effective-target vect_partial_vectors } */
> > > > >
> > > > >  #include <stdarg.h>
> > > > >  #include "tree-vect.h"
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> > > > > index
> > > >
> > 319bd125c3156f13c300ff2b94d269bb9ec29e97..a4886654f152b2c0568286fe
> > > > bea2b31cb7be8499 100644
> > > > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> > > > > @@ -5,8 +5,9 @@
> > > > >
> > > > >  /* { dg-additional-options "-Ofast" } */
> > > > >
> > > > > -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > -/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
> > > > > +/* Multiple loads of different alignments, we can't peel this. */
> > > > > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> > > > > +/* { dg-final { scan-tree-dump-not "epilog loop required" "vect" } } 
> > > > > */
> > > > >
> > > > >  void abort ();
> > > > >
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
> > > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
> > > > > index
> > > >
> > 8a8c076ba92ca6fef419cb23b457a23555c61c64..b58a4611d6b8d86f0247d9ea
> > > > 44ab4750473589a9 100644
> > > > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
> > > > > @@ -5,8 +5,9 @@
> > > > >
> > > > >  /* { dg-additional-options "-Ofast" } */
> > > > >
> > > > > -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > > -/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
> > > > > +/* Multiple loads with different misalignments.  Can't peel need 
> > > > > partial loop
> > > > support.  */
> > > > > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> > > > > +/* { dg-final { scan-tree-dump-not "epilog loop required" "vect" } } 
> > > > > */
> > > > >  void abort ();
> > > > >
> > > > >  unsigned short sa[32];
> > > > > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> > > > > index
> > > >
> > 6d5854ac7c7a18e09ec7ad72c534abdc55cb6efa..3df48c8611c8fd843efb145c7c
> > > > 1a0afeee25edfa 100644
> > > > > --- a/gcc/tree-vect-data-refs.cc
> > > > > +++ b/gcc/tree-vect-data-refs.cc
> > > > > @@ -731,7 +731,9 @@ vect_analyze_early_break_dependences
> > (loop_vec_info
> > > > loop_vinfo)
> > > > >         if (is_gimple_debug (stmt))
> > > > >           continue;
> > > > >
> > > > > -       stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> > > > > +       stmt_vec_info stmt_vinfo
> > > > > +         = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt));
> > > > > +       stmt = STMT_VINFO_STMT (stmt_vinfo);
> > > > >         auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> > > > >         if (!dr_ref)
> > > > >           continue;
> > > > > @@ -748,26 +750,16 @@ vect_analyze_early_break_dependences
> > > > (loop_vec_info loop_vinfo)
> > > > >            bounded by VF so accesses are within range.  We only need 
> > > > > to check
> > > > >            the reads since writes are moved to a safe place where if 
> > > > > we get
> > > > >            there we know they are safe to perform.  */
> > > > > -       if (DR_IS_READ (dr_ref)
> > > > > -           && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
> > > > > +       if (DR_IS_READ (dr_ref))
> > > > >           {
> > > > > -           if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)
> > > > > -               || STMT_VINFO_STRIDED_P (stmt_vinfo))
> > > > > -             {
> > > > > -               const char *msg
> > > > > -                 = "early break not supported: cannot peel "
> > > > > -                   "for alignment, vectorization would read out of "
> > > > > -                   "bounds at %G";
> > > > > -               return opt_result::failure_at (stmt, msg, stmt);
> > > > > -             }
> > > > > -
> > > > > -           dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo);
> > > > > -           dr_info->need_peeling_for_alignment = true;
> > > > > +           dr_set_safe_speculative_read_required (stmt_vinfo, true);
> > > > > +           bool inbounds = ref_within_array_bound (stmt, DR_REF 
> > > > > (dr_ref));
> > > > > +           DR_SCALAR_KNOWN_BOUNDS (STMT_VINFO_DR_INFO (stmt_vinfo))
> > > > = inbounds;
> > > > >
> > > > >             if (dump_enabled_p ())
> > > > >               dump_printf_loc (MSG_NOTE, vect_location,
> > > > > -                              "marking DR (read) as needing peeling 
> > > > > for "
> > > > > -                              "alignment at %G", stmt);
> > > > > +                              "marking DR (read) as possibly needing 
> > > > > peeling "
> > > > > +                              "for alignment at %G", stmt);
> > > > >           }
> > > > >
> > > > >         if (DR_IS_READ (dr_ref))
> > > > > @@ -1326,9 +1318,6 @@ vect_record_base_alignments (vec_info *vinfo)
> > > > >     Compute the misalignment of the data reference DR_INFO when 
> > > > > vectorizing
> > > > >     with VECTYPE.
> > > > >
> > > > > -   RESULT is non-NULL iff VINFO is a loop_vec_info.  In that case, 
> > > > > *RESULT will
> > > > > -   be set appropriately on failure (but is otherwise left unchanged).
> > > > > -
> > > > >     Output:
> > > > >     1. initialized misalignment info for DR_INFO
> > > > >
> > > > > @@ -1337,7 +1326,7 @@ vect_record_base_alignments (vec_info *vinfo)
> > > > >
> > > > >  static void
> > > > >  vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info 
> > > > > *dr_info,
> > > > > -                              tree vectype, opt_result *result = 
> > > > > nullptr)
> > > > > +                              tree vectype)
> > > > >  {
> > > > >    stmt_vec_info stmt_info = dr_info->stmt;
> > > > >    vec_base_alignments *base_alignments = &vinfo->base_alignments;
> > > > > @@ -1365,63 +1354,29 @@ vect_compute_data_ref_alignment (vec_info
> > > > *vinfo, dr_vec_info *dr_info,
> > > > >      = exact_div (targetm.vectorize.preferred_vector_alignment 
> > > > > (vectype),
> > > > >                BITS_PER_UNIT);
> > > > >
> > > > > -  /* If this DR needs peeling for alignment for correctness, we must
> > > > > -     ensure the target alignment is a constant power-of-two multiple 
> > > > > of the
> > > > > -     amount read per vector iteration (overriding the above hook 
> > > > > where
> > > > > -     necessary).  */
> > > > > -  if (dr_info->need_peeling_for_alignment)
> > > > > +  if (loop_vinfo
> > > > > +      && dr_safe_speculative_read_required (stmt_info))
> > > > >      {
> > > > > -      /* Vector size in bytes.  */
> > > > > -      poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT
> > (vectype));
> > > > > -
> > > > > -      /* We can only peel for loops, of course.  */
> > > > > -      gcc_checking_assert (loop_vinfo);
> > > > > -
> > > > > -      /* Calculate the number of vectors read per vector iteration.  
> > > > > If
> > > > > -      it is a power of two, multiply through to get the required
> > > > > -      alignment in bytes.  Otherwise, fail analysis since alignment
> > > > > -      peeling wouldn't work in such a case.  */
> > > > > -      poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> > > > > +      poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> > > > > +      auto vectype_size
> > > > > +     = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
> > > > > +      poly_uint64 new_alignment = vf * vectype_size;
> > > > > +      /* If we have a grouped access we require that the alignment 
> > > > > be N * elem.
> > */
> > > > >        if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > > > -     num_scalars *= DR_GROUP_SIZE (stmt_info);
> > > > > +     new_alignment *= DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT
> > > > (stmt_info));
> > > > > -      auto num_vectors = vect_get_num_vectors (num_scalars, vectype);
> > > > > -      if (!pow2p_hwi (num_vectors))
> > > > > -     {
> > > > > -       *result = opt_result::failure_at (vect_location,
> > > > > -                                         "non-power-of-two num 
> > > > > vectors %u "
> > > > > -                                         "for DR needing peeling for 
> > > > > "
> > > > > -                                         "alignment at %G",
> > > > > -                                         num_vectors, 
> > > > > stmt_info->stmt);
> > > > > -       return;
> > > > > -     }
> > > > > -
> > > > > -      safe_align *= num_vectors;
> > > > > -      if (maybe_gt (safe_align, 4096U))
> > > > > -     {
> > > > > -       pretty_printer pp;
> > > > > -       pp_wide_integer (&pp, safe_align);
> > > > > -       *result = opt_result::failure_at (vect_location,
> > > > > -                                         "alignment required for 
> > > > > correctness"
> > > > > -                                         " (%s) may exceed page 
> > > > > size",
> > > > > -                                         pp_formatted_text (&pp));
> > > > > -       return;
> > > > > -     }
> > > > > -
> > > > > -      unsigned HOST_WIDE_INT multiple;
> > > > > -      if (!constant_multiple_p (vector_alignment, safe_align, 
> > > > > &multiple)
> > > > > -       || !pow2p_hwi (multiple))
> > > > > +      unsigned HOST_WIDE_INT target_alignment;
> > > > > +      if (new_alignment.is_constant (&target_alignment)
> > > > > +       && pow2p_hwi (target_alignment))
> > > > >       {
> > > > >         if (dump_enabled_p ())
> > > > >           {
> > > > >             dump_printf_loc (MSG_NOTE, vect_location,
> > > > > -                            "forcing alignment for DR from preferred 
> > > > > (");
> > > > > -           dump_dec (MSG_NOTE, vector_alignment);
> > > > > -           dump_printf (MSG_NOTE, ") to safe align (");
> > > > > -           dump_dec (MSG_NOTE, safe_align);
> > > > > -           dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt);
> > > > > +                            "alignment increased due to early break 
> > > > > to ");
> > > > > +           dump_dec (MSG_NOTE, new_alignment);
> > > > > +           dump_printf (MSG_NOTE, " bytes.\n");
> > > > >           }
> > > > > -       vector_alignment = safe_align;
> > > > > +       vector_alignment = target_alignment;
> > > > >       }
> > > > >      }
> > > > >
> > > > > @@ -2487,6 +2442,8 @@ vect_enhance_data_refs_alignment
> > (loop_vec_info
> > > > loop_vinfo)
> > > > >        || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT
> > (loop_vinfo),
> > > > >                                      loop_preheader_edge (loop))
> > > > >        || loop->inner
> > > > > +      /* We don't currently maintaing the LCSSA for prologue peeled 
> > > > > inversed
> > > > > +      loops.  */
> > > > >        || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
> > > > >      do_peeling = false;
> > > > >
> > > > > @@ -2950,12 +2907,9 @@ vect_analyze_data_refs_alignment
> > (loop_vec_info
> > > > loop_vinfo)
> > > > >         if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)
> > > > >             && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != 
> > > > > dr_info->stmt)
> > > > >           continue;
> > > > > -       opt_result res = opt_result::success ();
> > > > > +
> > > > >         vect_compute_data_ref_alignment (loop_vinfo, dr_info,
> > > > > -                                        STMT_VINFO_VECTYPE 
> > > > > (dr_info->stmt),
> > > > > -                                        &res);
> > > > > -       if (!res)
> > > > > -         return res;
> > > > > +                                        STMT_VINFO_VECTYPE 
> > > > > (dr_info->stmt));
> > > > >       }
> > > > >      }
> > > > >
> > > > > @@ -7219,7 +7173,7 @@ vect_supportable_dr_alignment (vec_info *vinfo,
> > > > dr_vec_info *dr_info,
> > > > >
> > > > >    if (misalignment == 0)
> > > > >      return dr_aligned;
> > > > > -  else if (dr_info->need_peeling_for_alignment)
> > > > > +  else if (dr_safe_speculative_read_required(stmt_info))
> > > >
> > > > space after dr_safe_speculative_read_required
> > > >
> > > > Depending on vectorizable_load this might be a bit restrictive (it
> > > > makes peeling do the magic, but we could check
> > > > dr_safe_speculative_read_required there as well).  But it's OK for
> > > > now at least.
> > > >
> > > > >      return dr_unaligned_unsupported;
> > > > >
> > > > >    /* For now assume all conditional loads/stores support unaligned
> > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > > index
> > > >
> > 6bbb16beff2c627fca11a7403ba5ee3a5faa21c1..e0e5598babc7b119d408d8351
> > > > 2ae8c6047d4924c 100644
> > > > > --- a/gcc/tree-vect-stmts.cc
> > > > > +++ b/gcc/tree-vect-stmts.cc
> > > > > @@ -2597,6 +2597,165 @@ get_load_store_type (vec_info  *vinfo,
> > > > stmt_vec_info stmt_info,
> > > > >        return false;
> > > > >      }
> > > > >
> > > > > +  /* If this DR needs alignment for correctness, we must ensure the 
> > > > > target
> > > > > +     alignment is a constant power-of-two multiple of the amount 
> > > > > read per
> > > > > +     vector iteration or force masking.  */
> > > > > +  if (dr_safe_speculative_read_required (stmt_info))
> > > > > +    {
> > > > > +      /* We can only peel for loops, of course.  */
> > > > > +      gcc_checking_assert (loop_vinfo);
> > > > > +
> > > > > +      /* Check if we support the operation if early breaks are 
> > > > > needed.  Here we
> > > > > +      must ensure that we don't access any more than the scalar code 
> > > > > would
> > > > > +      have.  A masked operation would ensure this, so for these load 
> > > > > types
> > > > > +      force masking.  */
> > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > > +       && (*memory_access_type == VMAT_GATHER_SCATTER
> > > > > +           || *memory_access_type == VMAT_STRIDED_SLP))
> > > > > +     {
> > > > > +       if (dump_enabled_p ())
> > > > > +         dump_printf_loc (MSG_NOTE, vect_location,
> > > > > +                          "early break not supported: cannot peel 
> > > > > for "
> > > > > +                          "alignment. With non-contiguous memory
> > > > vectorization"
> > > > > +                          " could read out of bounds at %G ",
> > > > > +                          STMT_VINFO_STMT (stmt_info));
> > > > > +       LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
> > > > > +     }
> > > > > +
> > > > > +      /* Vector size in bytes.  */
> > > > > +      poly_uint64 safe_align;
> > > > > +      if (nunits.is_constant ())
> > > > > +     safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
> > > > > +      else
> > > > > +     safe_align = estimated_poly_value (LOOP_VINFO_VECT_FACTOR
> > > > (loop_vinfo),
> > > > > +                                        POLY_VALUE_MAX);
> > > > > +
> > > > > +      auto num_vectors = ncopies;
> > > >
> > > > You can't rely on ncopies for SLP, so in practice you'll see
> > > > num_vectors == 1 always (unless you force --param vect-force-slp=0).
> > > > I'm not sure why you need this test given you have the target alignment
> > > > check below?
> > >
> > > Yeah for SLP it doesn't matter, but as you said the param still exists 
> > > and I don't
> > > think we've introduced any SLP only code yet have we?
> > >
> > > If you prefer I can drop it. What I don't actually remember for the 
> > > non-SLP case
> > > is if the VF accounts for the ncopies growth as well? If so I agree this 
> > > is completely
> > > superfluous for SLP and non-SLP.
> > 
> > VF and GROUP_SIZE are correct, independent of whether we use SLP or
> > not, it's just that stupid ncopies vs. vec_num stuff that adds to
> > confusion.
> > 
> > So yes, please drop it.
> > 
> > > >
> > > > > +      if (!pow2p_hwi (num_vectors))
> > > > > +     {
> > > > > +       if (dump_enabled_p ())
> > > > > +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > > +                          "non-power-of-two num vectors %u "
> > > > > +                          "for DR needing peeling for "
> > > > > +                          "alignment at %G",
> > > > > +                          num_vectors, STMT_VINFO_STMT (stmt_info));
> > > > > +       return false;
> > > > > +     }
> > > > > +
> > > > > +      auto target_alignment
> > > > > +     = DR_TARGET_ALIGNMENT (STMT_VINFO_DR_INFO (stmt_info));
> > > > > +      unsigned HOST_WIDE_INT target_align;
> > > > > +      bool inbounds
> > > > > +     = DR_SCALAR_KNOWN_BOUNDS (STMT_VINFO_DR_INFO (stmt_info));
> > > > > +
> > > > > +      /* If the scalar loop is known to be in bounds, and we're 
> > > > > using scalar
> > > > > +      accesses then there's no need to check further.  */
> > > > > +      if (inbounds
> > > > > +       && *memory_access_type == VMAT_ELEMENTWISE)
> > > > > +     {
> > > > > +       *alignment_support_scheme = dr_aligned;
> > > > > +       return true;
> > > > > +     }
> > > > > +
> > > > > +      bool group_aligned = false;
> > > > > +      if (target_alignment.is_constant (&target_align)
> > > > > +       && nunits.is_constant ())
> > > > > +     {
> > > > > +       poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> > > > > +       auto vectype_size
> > > > > +         = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
> > > > > +       poly_uint64 required_alignment = vf * vectype_size;
> > > > > +       /* If we have a grouped access we require that the alignment 
> > > > > be N * elem.
> > > > */
> > > > > +       if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > > > +         required_alignment *=
> > > > > +             DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (stmt_info));
> > > > > +       if (maybe_ne (target_alignment, required_alignment))
> > > >
> > > > Note ->target_alignment only holds if *alignment_support_scheme ==
> > > > dr_aligned but you don't seem to test this?  It would be OK to
> > > > have target_alignment be a multiple of required_alignment.
> > > >
> > >
> > > ACK, will change.
> > >
> > > > > +         {
> > > > > +           if (dump_enabled_p ())
> > > > > +             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > > +                          "desired alignment %wu not met. Instead 
> > > > > got %wu "
> > > > > +                          "needed for DR alignment at %G",
> > > >
> > > > 'needed' sounds odd, omit it?
> > > >
> > > > > +                          required_alignment.to_constant (),
> > > > > +                          target_align, STMT_VINFO_STMT (stmt_info));
> > > > > +           return false;
> > > > > +         }
> > > > > +
> > > > > +       if (!pow2p_hwi (target_align))
> > > > > +         {
> > > > > +           if (dump_enabled_p ())
> > > > > +             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > > +                          "non-power-of-two vector alignment %wd "
> > > > > +                          "needed for DR alignment at %G",
> > > >
> > > > likewise.
> > > >
> > > > > +                          target_align, STMT_VINFO_STMT (stmt_info));
> > > > > +           return false;
> > > > > +         }
> > > > > +       group_aligned = true;
> > > > > +     }
> > > > > +      else if (inbounds)
> > > >
> > > > so you have those early outs (return false) above, but here is a 
> > > > fallback?
> > > > Shouldn't this then be not an else if but
> > > >
> > > >     if (!group_aligned && inbounds)
> > > >
> > > > ?
> > >
> > > Ack, so you mean that those cases could still be handled by partial 
> > > vectors.
> > > Hmm that's true.
> > >
> > > >
> > > > > +     LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
> > > > > +      else
> > > >
> > > > and this else if (!group_aligned)?
> > > >
> > > > > +     return false;
> > > > > +
> > > > > +      /* Even if uneven group sizes are aligned on the first load, 
> > > > > the second
> > > > > +      iteration won't be.  As such reject non-power of 2 group 
> > > > > sizes.  */
> > > >
> > > > isn't this covered above?
> > >
> > > Ack, the group_size * above on the power of two alignment check should 
> > > cover
> > it.
> > >
> > > >
> > > > > +      if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > > > +     {
> > > > > +       auto group_size = DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT
> > > > (stmt_info));
> > > > > +       if (!pow2p_hwi (group_size))
> > > > > +         {
> > > > > +           if (group_aligned || inbounds)
> > > > > +             LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) =
> > > > true;
> > > > > +           else
> > > > > +             {
> > > > > +               if (dump_enabled_p ())
> > > > > +                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, 
> > > > > vect_location,
> > > > > +                          "early break not supported: uneven group 
> > > > > size, "
> > > > > +                          "vectorization could read out of bounds at 
> > > > > %G ",
> > > > > +                          STMT_VINFO_STMT (stmt_info));
> > > > > +               return false;
> > > > > +             }
> > > > > +         }
> > > > > +     }
> > > > > +
> > > > > +      safe_align *= num_vectors;
> > > > > +      /* For VLA we have to insert a runtime check that the vector 
> > > > > loads
> > > > > +      per iterations don't exceed a page size.  For now we can use
> > > > > +      POLY_VALUE_MAX as a proxy as we can't peel for VLA.  */
> > > >
> > > > And this should be a check on required_alignment computed above,
> > > > resulting in group_aligned = false but still handling it in the
> > > > inbounds case?
> > > >
> > >
> > > ACK, since if larger than a pagesize it can be handled with partial
> > > loops for the inbound case. and I guess out of bound case should be
> > > handled through partial vectors.
> > >
> > > > > +      if (maybe_gt (safe_align, (unsigned)param_min_pagesize))
> > > > > +     {
> > > > > +       if (dump_enabled_p ())
> > > > > +         {
> > > > > +           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > > +                            "alignment required for correctness (");
> > > > > +           dump_dec (MSG_MISSED_OPTIMIZATION, safe_align);
> > > > > +           dump_printf (MSG_NOTE, ") may exceed page size\n");
> > > > > +         }
> > > > > +       return false;
> > > > > +     }
> > > > > +
> > > > > +      /* There are multiple loads that have a misalignment that we 
> > > > > couldn't
> > > > > +      align.  We would need LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P to
> > > > > +      vectorize. */
> > > > > +      if (!STMT_VINFO_GROUPED_ACCESS (stmt_info)
> > > > > +       && *misalignment != 0)
> > > >
> > > > that's probably the missing dr_aligned check?
> > > >
> > > > So, what I'm confused about is the following: we set ->target_alignment
> > > > (if possible) to sth that peeling can ensure the accesses will be
> > > > aligned - this then results in dr_aligned.  Here we verify if the
> > > > access is aligned and aligned enough.
> > > >
> > > > Now as said above, that we end up with dr_unaligned_unsupported is
> > > > off, so we probably shouldn't use that to indicate the higher
> > > > desired alignment to the peeling code.
> > > >
> > >
> > > This is because of cases like in 
> > > /work/gcc/gcc/testsuite/gcc.dg/vect/vect-early-
> > break_56.c
> > >
> > > Where
> > >
> > >   /* check results:  */
> > >   for (i = 0; i < n; i++)
> > >     {
> > >       if (sa[i+7] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
> > >         abort ();
> > >     }
> > >
> > > Where there is no group loads, but sa and ia have different misalignments 
> > > and
> > > peeling can't do anything here as there's no way to peel them to a mutual
> > alignment
> > > (it's even not something we even try).
> > >
> > > So this is to reject these cases. If we get to vectorizable_load with any
> > misalignment
> > > still set, it means we had multiple known misalignments which we can't 
> > > vectorize.
> > >
> > > So explicitly, these cases *won't* be dr_aligned but will be marked as 
> > > requiring
> > > alignment.
> > 
> > Yes, but this should have been covered above already as well if you'd
> > actually checked for dr_aligned instead of just matching up
> > target_alignment (that wouldn't hold in this case) and required_alignment.
> > 
> > > > > +     return false;
> > > > > +
> > > > > +      /* When using a group access the first element may be aligned 
> > > > > but the
> > > > > +      subsequent loads may not be.  For LOAD_LANES since the loads 
> > > > > are based
> > > > > +      on the first DR then all loads in the group are aligned.  For
> > > > > +      non-LOAD_LANES this is not the case. In particular a load + 
> > > > > blend when
> > > > > +      there are gaps can have the non first loads issued unaligned, 
> > > > > even
> > > > > +      partially overlapping the memory of the first load in order to 
> > > > > simplify
> > > > > +      the blend.  This is what the x86_64 backend does for instance. 
> > > > >  As
> > > > > +      such only the first load in the group is aligned, the rest are 
> > > > > not.  */
> > > > > +      else if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
> > > > > +            && *misalignment != 0
> > > > > +            && *memory_access_type != VMAT_LOAD_STORE_LANES)
> > > > > +     *alignment_support_scheme = dr_unaligned_supported;
> > > >
> > > > this looks definitely bogus, some targets cannot do unaligned accesses.
> > > >
> > >
> > > This is because of cases like 
> > > /work/gcc/gcc/testsuite/gcc.dg/vect/vect-early-
> > break_26.c:
> > >
> > >   for (int i = 2; i < 64; i+=2)
> > >     if (b[i] != a[i] - a[i-2]
> > >         || b[i+1] != a[i+1] - a[i-1])
> > >       abort ();
> > >
> > > Where we have a group load with a with a gap.  On x86_64 this results in 
> > > two
> > loads
> > > + a blend.  But the second load isn't aligned.
> > 
> > This should be either covered by the above checks already or if only
> > one load arrives here it's the fault of the downstream code splitting
> > the aligned load into something that isn't valid to speculate.
> > 
> > > I guess as we put all the alignment requirements on the group leader, that
> > perhaps
> > > this code in get_load_store_type should be gated on
> > >
> > > STMT_VINFO_GROUPED_ACCESS (stmt_info)
> > > && DR_GROUP_FIRST_ELEMENT (stmt_info) == stmt_info
> > >
> > > ?
> > 
> > No, I don't think so.  The code that eventually performs a
> > contiguous sub-group access directly should never extend
> > the load beyond GROUP_SIZE - or should be gated on the DR
> > not executed speculatively.  That is, we should "fix" this
> > elsewhere.
> > 
> 
> It doesn't, it's just not aligned within the range of GROUP_SIZE
> from what I remember.
> 
> > If you have an updated patch I can look at what's wrong here if you
> > tell me how to reproduce (after applying the patch I suppose).
> 
> Yes, applying the patch and running:
> 
> /work/build/gcc/xgcc -B/work/build/gcc/ 
> /work/gcc/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c  -m64   
> -fdiagnostics-plain-output  -flto -ffat-lto-objects -msse2 -ftree-vectorize 
> -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 
> -fdump-tree-vect-details -msse4.1  -lm  -o ./vect-early-break_26.exe

So it works as in executing fine.  We have a VF of 4 and

note:   recording new base alignment for &b
  alignment:    32
  misalignment: 0
  based on:     _1 = b[i_32];
/space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c:35:21: 
note:   recording new base alignment for &a
  alignment:    32
  misalignment: 0
  based on:     _2 = a[i_32];
/space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c:35:21: 
note:   vect_compute_data_ref_alignment:
/space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c:35:21: 
note:   alignment increased due to early break to 32 bytes.
/space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c:35:21: 
missed:   misalign = 8 bytes of ref b[i_32]
/space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c:35:21: 
note:   vect_compute_data_ref_alignment:
/space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c:35:21: 
note:   alignment increased due to early break to 32 bytes.

so no peeling necessary.  But we also have like

/space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c:35:21: 
missed:   misalign = 12 bytes of ref b[_6]
/space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c:35:21: 
note:   vect_compute_data_ref_alignment:
/space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c:35:21: 
note:   alignment increased due to early break to 32 bytes.

and we are correctly saying we vectorize an unaligned access.

The "issue" is we're having SLP nodes with a load permutation, their
expansion might not happen with the whole DR group in mind.  I'd say
we simply refuse to do early break speculative load vectorization
for SLP nodes with a load permutation.

It looks like a latent issue to me which could also interfere with
gap peeling, I have to dig a bit further what code is responsible
for the current behavior ...



> Thanks,
> Tamar
> 
> > 
> > > Enforcing the alignment on every group member would be wrong I think since
> > > that ends up higher overall alignment than they need.
> > >
> > > > So besides these issues in get_load_store_type the change looks good 
> > > > now.
> > > >
> > >
> > > Thanks for the reviews.
> > >
> > > Tamar
> > > > Richard.
> > > >
> > > > > +      else
> > > > > +     *alignment_support_scheme = dr_aligned;
> > > > > +    }
> > > > > +
> > > > >    if (*alignment_support_scheme == dr_unaligned_unsupported)
> > > > >      {
> > > > >        if (dump_enabled_p ())
> > > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > > > index
> > > >
> > b0cb081cba0ae8b11fbfcfcb8c6d440ec451ccb5..97caf61b345735d297ec49fd6ca
> > > > 64797435b46fc 100644
> > > > > --- a/gcc/tree-vectorizer.h
> > > > > +++ b/gcc/tree-vectorizer.h
> > > > > @@ -1281,7 +1281,11 @@ public:
> > > > >
> > > > >    /* Set by early break vectorization when this DR needs peeling for 
> > > > > alignment
> > > > >       for correctness.  */
> > > > > -  bool need_peeling_for_alignment;
> > > > > +  bool safe_speculative_read_required;
> > > > > +
> > > > > +  /* Set by early break vectorization when this DR's scalar accesses 
> > > > > are known
> > > > > +     to be inbounds of a known bounds loop.  */
> > > > > +  bool scalar_access_known_in_bounds;
> > > > >
> > > > >    tree base_decl;
> > > > >
> > > > > @@ -1997,6 +2001,35 @@ dr_target_alignment (dr_vec_info *dr_info)
> > > > >    return dr_info->target_alignment;
> > > > >  }
> > > > >  #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR)
> > > > > +#define DR_SCALAR_KNOWN_BOUNDS(DR) (DR)-
> > > > >scalar_access_known_in_bounds
> > > > > +
> > > > > +/* Return if the stmt_vec_info requires peeling for alignment.  */
> > > > > +inline bool
> > > > > +dr_safe_speculative_read_required (stmt_vec_info stmt_info)
> > > > > +{
> > > > > +  dr_vec_info *dr_info;
> > > > > +  if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > > > +    dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT
> > (stmt_info));
> > > > > +  else
> > > > > +    dr_info = STMT_VINFO_DR_INFO (stmt_info);
> > > > > +
> > > > > +  return dr_info->safe_speculative_read_required;
> > > > > +}
> > > > > +
> > > > > +/* Set the safe_speculative_read_required for the the stmt_vec_info, 
> > > > > if group
> > > > > +   access then set on the fist element otherwise set on DR directly. 
> > > > >  */
> > > > > +inline void
> > > > > +dr_set_safe_speculative_read_required (stmt_vec_info stmt_info,
> > > > > +                                    bool requires_alignment)
> > > > > +{
> > > > > +  dr_vec_info *dr_info;
> > > > > +  if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > > > +    dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT
> > (stmt_info));
> > > > > +  else
> > > > > +    dr_info = STMT_VINFO_DR_INFO (stmt_info);
> > > > > +
> > > > > +  dr_info->safe_speculative_read_required = requires_alignment;
> > > > > +}
> > > > >
> > > > >  inline void
> > > > >  set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val)
> > > > >
> > > > >
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguent...@suse.de>
> > > > SUSE Software Solutions Germany GmbH,
> > > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> > >
> > 
> > --
> > Richard Biener <rguent...@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to