RE: [PATCH]middle-end: delay checking for alignment to load [PR118464]

Richard Biener Wed, 05 Feb 2025 02:16:56 -0800

On Wed, 5 Feb 2025, Tamar Christina wrote:

> 
> 
> > -----Original Message-----
> > From: Richard Biener <rguent...@suse.de>
> > Sent: Tuesday, February 4, 2025 12:49 PM
> > To: Tamar Christina <tamar.christ...@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>
> > Subject: RE: [PATCH]middle-end: delay checking for alignment to load 
> > [PR118464]
> > 
> > On Tue, 4 Feb 2025, Tamar Christina wrote:
> > 
> > > Looks like a last minute change I made accidentally blocked SVE. Fixed 
> > > and re-
> > sending:
> > >
> > > Hi All,
> > >
> > > This fixes two PRs on Early break vectorization by delaying the safety 
> > > checks to
> > > vectorizable_load when the VF, VMAT and vectype are all known.
> > >
> > > This patch does add two new restrictions:
> > >
> > > 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
> > >    group sizes, as they are unaligned every n % 2 iterations and so may 
> > > cross
> > >    a page unwittingly.
> > >
> > > 2. On LOAD_LANES targets when the buffer is unknown, we reject 
> > > vectorization
> > if
> > >    we cannot peel for alignment, as the alignment requirement is quite 
> > > large at
> > >    GROUP_SIZE * vectype_size.  This is unlikely to ever be beneficial so 
> > > we
> > >    don't support it for now.
> > >
> > > There are other steps documented inside the code itself so that the 
> > > reasoning
> > > is next to the code.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > > -m32, -m64 and no issues.
> > >
> > > On arm-none-linux-gnueabihf some tests are failing to vectorize because 
> > > it looks
> > > like LOAD_LANES is often misaligned. I need to debug those a bit more to 
> > > see if
> > > it's the patch or backend.
> > >
> > > For now I think the patch itself is fine.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   PR tree-optimization/118464
> > >   PR tree-optimization/116855
> > >   * doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
> > >   * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
> > >   checks.
> > >   (vect_compute_data_ref_alignment): Remove alignment checks and move
> > to
> > >   vectorizable_load.
> > >   (vect_enhance_data_refs_alignment): Add note to comment needing
> > >   investigating.
> > >   (vect_analyze_data_refs_alignment): Likewise.
> > >   (vect_supportable_dr_alignment): For group loads look at first DR.
> > >   * tree-vect-stmts.cc (get_load_store_type, vectorizable_load):
> > >   Perform safety checks for early break pfa.
> > >   * tree-vectorizer.h (dr_peeling_alignment): New.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   PR tree-optimization/118464
> > >   PR tree-optimization/116855
> > >   * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
> > >   load type is relaxed later.
> > >   * gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
> > >   * gcc.dg/vect/vect-early-break_22.c: Reject for load_lanes targets
> > >   * g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
> > >   * gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
> > >   * gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
> > >   * gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
> > >   * gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
> > >   * gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
> > >   * gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
> > >   * gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
> > >   * gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
> > >   * gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
> > >   * gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
> > >   * gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
> > >
> > > -- inline copy of patch --
> > >
> > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > index
> > dddde54a287dbdf504f540bc499e024d077746a8..85f9c49eff437221f2cea77c1
> > 14064a6a603b732 100644
> > > --- a/gcc/doc/invoke.texi
> > > +++ b/gcc/doc/invoke.texi
> > > @@ -17246,7 +17246,7 @@ Maximum number of relations the oracle will
> > register in a basic block.
> > >  Work bound when discovering transitive relations from existing relations.
> > >
> > >  @item min-pagesize
> > > -Minimum page size for warning purposes.
> > > +Minimum page size for warning and early break vectorization purposes.
> > >
> > >  @item openacc-kernels
> > >  Specify mode of OpenACC `kernels' constructs handling.
> > > diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> > b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..5e50e56ad17515e278c05c
> > 92263af120c3ab2c21
> > > --- /dev/null
> > > +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc
> > > @@ -0,0 +1,23 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +#include <cstddef>
> > > +
> > > +struct ts1 {
> > > +  int spans[6][2];
> > > +};
> > > +struct gg {
> > > +  int t[6];
> > > +};
> > > +ts1 f(size_t t, struct ts1 *s1, struct gg *s2) {
> > > +  ts1 ret;
> > > +  for (size_t i = 0; i != t; i++) {
> > > +    if (!(i < t)) __builtin_abort();
> > > +    ret.spans[i][0] = s1->spans[i][0] + s2->t[i];
> > > +    ret.spans[i][1] = s1->spans[i][1] + s2->t[i];
> > > +  }
> > > +  return ret;
> > > +}
> > 
> > This one ICEd at some point?  Otherwise it doesn't test anything?
> > 
> 
> Yes. It and the one below it were ICEng and are not necessarily vectorizable
> code as they were ICEing early.
> 
> > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > > index
> > 9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d
> > 93c950629f3231554 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> > > @@ -55,7 +55,9 @@ int main()
> > >     }
> > >      }
> > >    rephase ();
> > > +#pragma GCC novector
> > >    for (i = 0; i < 32; ++i)
> > > +#pragma GCC novector
> > >      for (j = 0; j < 3; ++j)
> > >  #pragma GCC novector
> > >        for (k = 0; k < 3; ++k)
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > > index
> > 423ff0b566b18bf04ce4f67a45b94dc1a021a4a0..8bd85f3893f08157e640414b
> > 5b252b716a8ba93a 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> > > @@ -5,7 +5,8 @@
> > >  /* { dg-additional-options "-O3" } */
> > >  /* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } 
> > > */
> > >
> > > -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +/* Arm creates a group size of 3 here, which we can't support yet.  */
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! 
> > > arm*-*-* }
> > } } } */
> > >
> > >  typedef struct filter_list_entry {
> > >    const char *name;
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..9bf0cbc8853f74de550e8a
> > c83ab569fc9fbde126
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c
> > > @@ -0,0 +1,25 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +int a, b, c, d, e, f;
> > > +short g[1];
> > > +int main() {
> > > +  int h;
> > > +  while (a) {
> > > +    while (h)
> > > +      ;
> > > +    for (b = 2; b; b--) {
> > > +      while (c)
> > > +        ;
> > > +      f = g[a];
> > > +      if (d)
> > > +        break;
> > > +    }
> > > +    while (e)
> > > +      ;
> > > +  }
> > > +  return 0;
> > > +}
> > 
> > again, what does this test?
> > 
> 
> That we don't ICE during data ref analysis.
> 
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..dc771186efafe25bb65490
> > da7a383ad7f6ceb0a7
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c
> > > @@ -0,0 +1,19 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +char string[1020];
> > > +
> > > +char * find(int n, char c)
> > > +{
> > > +    for (int i = 1; i < n; i++) {
> > > +        if (string[i] == c)
> > > +            return &string[i];
> > > +    }
> > > +    return 0;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +/* { dg-final { scan-tree-dump "Alignment of access forced using 
> > > peeling" "vect"
> > } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..1cf58e4f6307f3d258cf093
> > afe6c86a998cdd216
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c
> > > @@ -0,0 +1,24 @@
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* Alignment requirement too big, load lanes targets can't safely 
> > > vectorize this.
> > */
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { !
> > vect_load_lanes } } } } */
> > > +/* { dg-final { scan-tree-dump-not "Peeling for alignment will be 
> > > applied" "vect"
> > { target { ! vect_load_lanes } } } } */
> > > +
> > 
> > Any reason to not use "Alignment of access forced using peeling" like in
> > the other testcases?
> 
> Didn't know there was a preference.. will update.
> 
> > > +unsigned test4(char x, char *restrict vect_a, char *restrict vect_b, int 
> > > n)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 0; i < (n - 2); i+=2)
> > > + {
> > > +   if (vect_a[i] > x || vect_a[i+2] > x)
> > > +     return 1;
> > > +
> > > +   vect_b[i] = x;
> > > +   vect_b[i+1] = x+1;
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..7d56772fbf380ce42ac758
> > ca29a5f3f9d3f6e0d1
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c
> > > @@ -0,0 +1,19 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +char string[1020];
> > > +
> > > +char * find(int n, char c)
> > > +{
> > > +    for (int i = 0; i < n; i++) {
> > > +        if (string[i] == c)
> > > +            return &string[i];
> > > +    }
> > > +    return 0;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using 
> > > peeling"
> > "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..374a051b945e97eedb9be
> > 9da423cf54b5e564d6f
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c
> > > @@ -0,0 +1,20 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +char string[1020] __attribute__((aligned(1)));
> > > +
> > > +char * find(int n, char c)
> > > +{
> > > +    for (int i = 1; i < n; i++) {
> > > +        if (string[i] == c)
> > > +            return &string[i];
> > > +    }
> > > +    return 0;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +/* { dg-final { scan-tree-dump "Alignment of access forced using 
> > > peeling" "vect"
> > } } */
> > > +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..297fb7e9b9beffa25ab8f25
> > 7ceea1c065fcc6ae9
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c
> > > @@ -0,0 +1,20 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +char string[1020] __attribute__((aligned(1)));
> > > +
> > > +char * find(int n, char c)
> > > +{
> > > +    for (int i = 0; i < n; i++) {
> > > +        if (string[i] == c)
> > > +            return &string[i];
> > > +    }
> > > +    return 0;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using 
> > > peeling"
> > "vect" } } */
> > > +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..32a4cee68f3418ed4fc660
> > 4ffd03ff4d8ff53d6b
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c
> > > @@ -0,0 +1,23 @@
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +unsigned test4(char x, char *vect, int n)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 0; i < n; i++)
> > > + {
> > > +   if (vect[i] > x)
> > > +     return 1;
> > > +
> > > +   vect[i] = x;
> > > + }
> > > + return ret;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "Peeling for alignment will be applied" 
> > > "vect" } } */
> > 
> > See above.
> > 
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..ee123df6ed2ba97e92307c
> > 64a61c97b1b6268743
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
> > > @@ -0,0 +1,23 @@
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +unsigned test4(char x, char *vect_a, char *vect_b, int n)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 1; i < n; i++)
> > > + {
> > > +   if (vect_a[i] > x || vect_b[i] > x)
> > > +     return 1;
> > > +
> > > +   vect_a[i] = x;
> > > + }
> > > + return ret;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "Versioning for alignment will be 
> > > applied" "vect" }
> > } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..51bad4e745b67cfdaad20f
> > 50776299531824ce9c
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c
> > > @@ -0,0 +1,23 @@
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* This should be vectorizable through load_lanes and linear targets.  */
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +unsigned test4(char x, char * restrict vect_a, char * restrict vect_b, 
> > > int n)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 0; i < n; i+=2)
> > > + {
> > > +   if (vect_a[i] > x || vect_a[i+1] > x)
> > > +     return 1;
> > > +
> > > +   vect_b[i] = x;
> > > +   vect_b[i+1] = x+1;
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..d4d87768a60803ed943f9
> > 5ccbda19e7e7812bf29
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c
> > > @@ -0,0 +1,27 @@
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +/* { dg-final { scan-tree-dump "Access will not read beyond buffer to 
> > > due known
> > size buffer" "vect" } } */
> > > +
> > > +
> > > +char vect_a[1025];
> > > +char vect_b[1025];
> > > +
> > > +unsigned test4(char x, int n)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 1; i < (n - 2); i+=2)
> > > + {
> > > +   if (vect_a[i] > x || vect_a[i+1] > x)
> > > +     return 1;
> > > +
> > > +   vect_b[i] = x;
> > > +   vect_b[i+1] = x+1;
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..8419d7a9201dc8c433a23
> > 8f59520dea7e35c666e
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c
> > > @@ -0,0 +1,28 @@
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* Group size is uneven, load lanes targets can't safely vectorize this. 
> > >  */
> > > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> > > +/* { dg-final { scan-tree-dump-not "Peeling for alignment will be 
> > > applied" "vect"
> > } } */
> > 
> > Likewise.
> > 
> > > +
> > > +
> > > +char vect_a[1025];
> > > +char vect_b[1025];
> > > +
> > > +unsigned test4(char x, int n)
> > > +{
> > > + unsigned ret = 0;
> > > + for (int i = 1; i < (n - 2); i+=2)
> > > + {
> > > +   if (vect_a[i-1] > x || vect_a[i+2] > x)
> > > +     return 1;
> > > +
> > > +   vect_b[i] = x;
> > > +   vect_b[i+1] = x+1;
> > > + }
> > > + return ret;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> > > index
> > b3f5984f682f30f79331d48a264c2cc4af3e2503..a4bab5a72e369892c65569a0
> > 4ec7507e32993ce8 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> > > @@ -42,4 +42,6 @@ main ()
> > >    return 0;
> > >  }
> > >
> > > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 
> > > "vect" } }
> > */
> > > +/* Targets that make this LOAD_LANES fail due to the group misalignment. 
> > >  */
> > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 
> > > "vect" {
> > target vect_load_lanes } } } */
> > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 
> > > "vect" {
> > target { ! vect_load_lanes } } } } */
> > > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> > > index
> > 6eda40267bd1382938a77826d11f20dcc959a166..d0148d4938cafc3c59edfa6a
> > 60002933f384f65b 100644
> > > --- a/gcc/tree-vect-data-refs.cc
> > > +++ b/gcc/tree-vect-data-refs.cc
> > > @@ -731,7 +731,8 @@ vect_analyze_early_break_dependences (loop_vec_info
> > loop_vinfo)
> > >     if (is_gimple_debug (stmt))
> > >       continue;
> > >
> > > -   stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> > > +   stmt_vec_info stmt_vinfo
> > > +     = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt));
> > >     auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> > >     if (!dr_ref)
> > >       continue;
> > > @@ -748,26 +749,15 @@ vect_analyze_early_break_dependences
> > (loop_vec_info loop_vinfo)
> > >        bounded by VF so accesses are within range.  We only need to check
> > >        the reads since writes are moved to a safe place where if we get
> > >        there we know they are safe to perform.  */
> > > -   if (DR_IS_READ (dr_ref)
> > > -       && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
> > > +   if (DR_IS_READ (dr_ref))
> > >       {
> > > -       if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)
> > > -           || STMT_VINFO_STRIDED_P (stmt_vinfo))
> > > -         {
> > > -           const char *msg
> > > -             = "early break not supported: cannot peel "
> > > -               "for alignment, vectorization would read out of "
> > > -               "bounds at %G";
> > > -           return opt_result::failure_at (stmt, msg, stmt);
> > > -         }
> > > -
> > >         dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo);
> > >         dr_info->need_peeling_for_alignment = true;
> > 
> > You're setting the flag on any DR of a DR group here ...
> > 
> > >         if (dump_enabled_p ())
> > >           dump_printf_loc (MSG_NOTE, vect_location,
> > > -                          "marking DR (read) as needing peeling for "
> > > -                          "alignment at %G", stmt);
> > > +                          "marking DR (read) as possibly needing peeling 
> > > "
> > > +                          "for alignment at %G", stmt);
> > >       }
> > >
> > >     if (DR_IS_READ (dr_ref))
> > > @@ -1326,9 +1316,6 @@ vect_record_base_alignments (vec_info *vinfo)
> > >     Compute the misalignment of the data reference DR_INFO when 
> > > vectorizing
> > >     with VECTYPE.
> > >
> > > -   RESULT is non-NULL iff VINFO is a loop_vec_info.  In that case, 
> > > *RESULT will
> > > -   be set appropriately on failure (but is otherwise left unchanged).
> > > -
> > >     Output:
> > >     1. initialized misalignment info for DR_INFO
> > >
> > > @@ -1337,7 +1324,7 @@ vect_record_base_alignments (vec_info *vinfo)
> > >
> > >  static void
> > >  vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
> > > -                          tree vectype, opt_result *result = nullptr)
> > > +                          tree vectype)
> > >  {
> > >    stmt_vec_info stmt_info = dr_info->stmt;
> > >    vec_base_alignments *base_alignments = &vinfo->base_alignments;
> > > @@ -1365,66 +1352,6 @@ vect_compute_data_ref_alignment (vec_info *vinfo,
> > dr_vec_info *dr_info,
> > >      = exact_div (targetm.vectorize.preferred_vector_alignment (vectype),
> > >            BITS_PER_UNIT);
> > >
> > > -  /* If this DR needs peeling for alignment for correctness, we must
> > > -     ensure the target alignment is a constant power-of-two multiple of 
> > > the
> > > -     amount read per vector iteration (overriding the above hook where
> > > -     necessary).  */
> > > -  if (dr_info->need_peeling_for_alignment)
> > > -    {
> > > -      /* Vector size in bytes.  */
> > > -      poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT 
> > > (vectype));
> > > -
> > > -      /* We can only peel for loops, of course.  */
> > > -      gcc_checking_assert (loop_vinfo);
> > > -
> > > -      /* Calculate the number of vectors read per vector iteration.  If
> > > -  it is a power of two, multiply through to get the required
> > > -  alignment in bytes.  Otherwise, fail analysis since alignment
> > > -  peeling wouldn't work in such a case.  */
> > > -      poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> > > -      if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > - num_scalars *= DR_GROUP_SIZE (stmt_info);
> > > -
> > > -      auto num_vectors = vect_get_num_vectors (num_scalars, vectype);
> > > -      if (!pow2p_hwi (num_vectors))
> > > - {
> > > -   *result = opt_result::failure_at (vect_location,
> > > -                                     "non-power-of-two num vectors %u "
> > > -                                     "for DR needing peeling for "
> > > -                                     "alignment at %G",
> > > -                                     num_vectors, stmt_info->stmt);
> > > -   return;
> > > - }
> > > -
> > > -      safe_align *= num_vectors;
> > > -      if (maybe_gt (safe_align, 4096U))
> > > - {
> > > -   pretty_printer pp;
> > > -   pp_wide_integer (&pp, safe_align);
> > > -   *result = opt_result::failure_at (vect_location,
> > > -                                     "alignment required for correctness"
> > > -                                     " (%s) may exceed page size",
> > > -                                     pp_formatted_text (&pp));
> > > -   return;
> > > - }
> > > -
> > > -      unsigned HOST_WIDE_INT multiple;
> > > -      if (!constant_multiple_p (vector_alignment, safe_align, &multiple)
> > > -   || !pow2p_hwi (multiple))
> > > - {
> > > -   if (dump_enabled_p ())
> > > -     {
> > > -       dump_printf_loc (MSG_NOTE, vect_location,
> > > -                        "forcing alignment for DR from preferred (");
> > > -       dump_dec (MSG_NOTE, vector_alignment);
> > > -       dump_printf (MSG_NOTE, ") to safe align (");
> > > -       dump_dec (MSG_NOTE, safe_align);
> > > -       dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt);
> > > -     }
> > > -   vector_alignment = safe_align;
> > > - }
> > > -    }
> > > -
> > >    SET_DR_TARGET_ALIGNMENT (dr_info, vector_alignment);
> > >
> > >    /* If the main loop has peeled for alignment we have no way of knowing
> > > @@ -2479,7 +2406,7 @@ vect_enhance_data_refs_alignment (loop_vec_info
> > loop_vinfo)
> > >        || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT 
> > > (loop_vinfo),
> > >                                  loop_preheader_edge (loop))
> > >        || loop->inner
> > > -      || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
> > > +      || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo)) // <<-- ??
> > 
> > Spurious change(?)
> 
> I was actually wondering why this is here. I'm curious why we're saying we 
> can't
> peel for alignment on an inverted loop.


No idea either.

> > 
> > >      do_peeling = false;
> > >
> > >    struct _vect_peel_extended_info peel_for_known_alignment;
> > > @@ -2942,12 +2869,9 @@ vect_analyze_data_refs_alignment (loop_vec_info
> > loop_vinfo)
> > >     if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)
> > >         && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info->stmt)
> > >       continue;
> > > -   opt_result res = opt_result::success ();
> > > +
> > >     vect_compute_data_ref_alignment (loop_vinfo, dr_info,
> > > -                                    STMT_VINFO_VECTYPE (dr_info->stmt),
> > > -                                    &res);
> > > -   if (!res)
> > > -     return res;
> > > +                                    STMT_VINFO_VECTYPE (dr_info->stmt));
> > >   }
> > >      }
> > >
> > > @@ -7211,7 +7135,7 @@ vect_supportable_dr_alignment (vec_info *vinfo,
> > dr_vec_info *dr_info,
> > >
> > >    if (misalignment == 0)
> > >      return dr_aligned;
> > > -  else if (dr_info->need_peeling_for_alignment)
> > > +  else if (dr_peeling_alignment (stmt_info))
> > >      return dr_unaligned_unsupported;
> > >
> > >    /* For now assume all conditional loads/stores support unaligned
> > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > index
> > 1b639ae3b174779bb838d0f7ce4886d84ecafcfe..c213ec46c4bdb456f99a43a3a
> > ff7c1af80e8769a 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -2597,6 +2597,69 @@ get_load_store_type (vec_info  *vinfo,
> > stmt_vec_info stmt_info,
> > >        return false;
> > >      }
> > >
> > > +  auto unit_size = GET_MODE_UNIT_SIZE (TYPE_MODE (vectype));
> > > +  /* Check if a misalignment with an unsupported peeling for early break 
> > > is
> > > +     still OK.  First we need to distinguish between when we've reached 
> > > here do
> > 
> > due to
> > 
> > > +     to dependency analysis or when the user has requested 
> > > -mstrict-align or
> > > +     similar.  In those cases we must not override it.  */
> > > +  if (dr_peeling_alignment (stmt_info)
> > > +      && *alignment_support_scheme == dr_unaligned_unsupported
> > > +      /* We can only attempt to override if the misalignment is a 
> > > multiple of
> > > +  the element being loaded, otherwise peeling or versioning would have
> > > +  really been required.  */
> > > +      && multiple_p (*misalignment, unit_size))
> > 
> > Hmm, but wouldn't that mean dr_info->target_alignment is bigger
> > than the vector size?  Does that ever happen?  I'll note that
> > *alignment_support_scheme == dr_aligned means alignment according
> > to dr_info->target_alignment which might be actually less than
> > the vector size (we've noticed this recently in other PRs), so
> > we might want to make sure that dr_info->target_alignment is
> > at least vector size when ->need_peeling_for_alignment I think.
> > 
> 
> One reason I block LOAD_LANES from the non-inbound case is that in
> those cases dr_info->target_alignment would need to be GROUP_SIZE * vector 
> size
> to ensure that the entire access doesn't cross a page.  Because this puts an
> excessive alignment request in place I currently just reject the loop.

But that's true for all grouped accesses and one reason we wanted to move
this code here - we know group_size and the vectorization factor.

> But In principal it can happen, however the above checks for element size, not
> vector size.  This fails when the user has intentionally misaligned the array 
> and we
> don't support peeling for the access type to correct it.
> 
> So something like vect-early-break_133_pfa3.c with a grouped access for 
> instance.
> 
> > So - which of the testcases gets you here?  I think we
> > set *misalignment to be modulo target_alignment, never larger than that.
> > 
> 
> The condition passes for most cases yes, unless we can't peel in which case we
> vect-early-break_22.c and vect-early-break_121-pr114081.c two that get 
> rejected
> inside the block.
> 
> > > +    {
> > > +      bool inbounds
> > > + = ref_within_array_bound (STMT_VINFO_STMT (stmt_info),
> > > +                           DR_REF (STMT_VINFO_DATA_REF (stmt_info)));
> > > +      /* If we have a known misalignment, and are doing a group load for 
> > > a DR
> > > +  that requires aligned access, check if the misalignment is a multiple
> > > +  of the unit size.  In which case the group load will be issued aligned
> > > +  as long as the first load in the group is aligned.
> > > +
> > > +  For the non-inbound case we'd need goup_size * vectype alignment.  But
> > > +  this is quite huge and unlikely to ever happen so if we can't peel for
> > > +  it, just reject it.  */

I don't think the in-bound case is any different from the non-in-bound
case unless the size of the object is a multiple of the whole vector
access size as well.

> > > +      if (*memory_access_type == VMAT_LOAD_STORE_LANES
> > > +   && (STMT_VINFO_GROUPED_ACCESS (stmt_info) || slp_node))
> > > + {
> > > +   /* ?? This needs updating whenever we support slp group > 1.  */

?

> > > +   auto group_size = DR_GROUP_SIZE (stmt_info);
> > > +   /* For the inbound case it's enough to check for an alignment of
> > > +      GROUP_SIZE * element size.  */
> > > +   if (inbounds
> > > +       && (*misalignment % (group_size * unit_size)) == 0
> > > +       && group_size % 2 == 0)

It looks fishy that you do not need to consider the VF here.

> > > +     {
> > > +       if (dump_enabled_p ())
> > > +         dump_printf_loc (MSG_NOTE, vect_location,
> > > +                  "Assuming grouped access is aligned due to load "
> > > +                  "lanes, overriding alignment scheme\n");
> > > +
> > > +       *alignment_support_scheme = dr_unaligned_supported;
> > > +     }
> > > + }
> > > +      /* If we have a linear access and know the misalignment and know 
> > > we won't
> > > +  read out of bounds then it's also ok if the misalignment is a multiple
> > > +  of the element size.  We get this when the loop has known misalignments
> > > +  but the misalignments of the DRs can't be peeled to reach mutual
> > > +  alignment.  Because the misalignments are known however we also know
> > > +  that versioning won't work.  If the target does support unaligned
> > > +  accesses and we know we are free to read the entire buffer then we
> > > +  can allow the unaligned access if it's on elements for an early break
> > > +  condition.  */

See above - one of the PRs was exactly that we ovverread a decl even
if the original scalar accesses are all in-bounds.  So we can't allow 
this.

> > > +      else if (*memory_access_type != VMAT_GATHER_SCATTER
> > > +        && inbounds)
> > > + {
> > > +   if (dump_enabled_p ())
> > > +     dump_printf_loc (MSG_NOTE, vect_location,
> > > +                  "Access will not read beyond buffer to due known size "
> > > +                  "buffer, overriding alignment scheme\n");
> > > +
> > > +   *alignment_support_scheme = dr_unaligned_supported;
> > > + }
> > > +    }
> > > +
> > >    if (*alignment_support_scheme == dr_unaligned_unsupported)
> > >      {
> > >        if (dump_enabled_p ())
> > > @@ -10520,6 +10583,68 @@ vectorizable_load (vec_info *vinfo,
> > >    /* Transform.  */
> > >
> > >    dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info), *first_dr_info =
> > NULL;
> > > +
> > > +  /* Check if we support the operation if early breaks are needed.  */
> > > +  if (loop_vinfo
> > > +      && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > +      && (memory_access_type == VMAT_GATHER_SCATTER
> > > +   || memory_access_type == VMAT_STRIDED_SLP))
> > > +    {
> > > +      if (dump_enabled_p ())
> > > +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +                    "early break not supported: cannot peel for "
> > > +                    "alignment. With non-contiguous memory vectorization"
> > > +                    " could read out of bounds at %G ",
> > > +                    STMT_VINFO_STMT (stmt_info));
> > 
> > Hmm, this is now more restrictive than the original check in
> > vect_analyze_early_break_dependences because it covers all accesses.
> > The simplest fix would be to leave it there.
> > 
> 
> It covers all loads, which you're right is more restrictive, I think it just 
> needs to be
> moved inside   if (costing_p && dr_info->need_peeling_for_alignment) block
> below it though.
> 
> Delaying this to here instead of earlier has allowed us to vectorize 
> gcc.dg/vect/bb-slp-pr65935.c
> Which now vectorizes after the inner loops are unrolled.
> 
> Are you happy with just moving it down?

OK.

> > > +      return false;
> > > +    }
> > > +
> > > +  /* If this DR needs peeling for alignment for correctness, we must
> > > +     ensure the target alignment is a constant power-of-two multiple of 
> > > the
> > > +     amount read per vector iteration (overriding the above hook where
> > > +     necessary).  We don't support group loads, which would have been 
> > > filterd
> > > +     out in the check above.  For now it means we don't have to look at 
> > > the
> > > +     group info and just check that the load is continguous and can just 
> > > use
> > > +     dr_info.  For known size buffers we still need to check if the 
> > > vector
> > > +     is misaligned and if so we need to peel.  */
> > > +  if (costing_p && dr_info->need_peeling_for_alignment)
> > 
> > dr_peeling_alignment ()
> > 
> > I think this belongs in get_load_store_type, specifically I think
> > we want to only allow VMAT_CONTIGUOUS for need_peeling_for_alignment refs
> > and by construction dr_aligned should ensure type size alignment
> > (by altering target_alignment for the respective refs).  Given that
> > both VF and vector type are fixed at vect_analyze_data_refs_alignment
> > time we should be able to compute the appropriate target alignment
> > there (I'm not sure we support peeling of more than VF-1 iterations
> > though).
> > 
> > > +    {
> > > +      /* Vector size in bytes.  */
> > > +      poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT 
> > > (vectype));
> > > +
> > > +      /* We can only peel for loops, of course.  */
> > > +      gcc_checking_assert (loop_vinfo);
> > > +
> > > +      auto num_vectors = ncopies;
> > > +      if (!pow2p_hwi (num_vectors))
> > > + {
> > > +   if (dump_enabled_p ())
> > > +     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +                      "non-power-of-two num vectors %u "
> > > +                      "for DR needing peeling for "
> > > +                      "alignment at %G",
> > > +                      num_vectors, STMT_VINFO_STMT (stmt_info));
> > > +   return false;
> > > + }
> > > +
> > > +      safe_align *= num_vectors;
> > > +      if (known_gt (safe_align, (unsigned)param_min_pagesize)
> > > +   /* For VLA we don't support PFA when any unrolling needs to be done.
> > > +      We could though but too much work for GCC 15.  For now we assume a
> > > +      vector is not larger than a page size so allow single loads.  */
> > > +   && (num_vectors > 1 && !vf.is_constant ()))
> > > + {
> > > +   if (dump_enabled_p ())
> > > +     {
> > > +       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +                        "alignment required for correctness (");
> > > +       dump_dec (MSG_MISSED_OPTIMIZATION, safe_align);
> > > +       dump_printf (MSG_NOTE, ") may exceed page size\n");
> > > +     }
> > > +   return false;
> > > + }
> > > +    }
> > > +
> > >    ensure_base_align (dr_info);
> > >
> > >    if (memory_access_type == VMAT_INVARIANT)
> > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > index
> > 44d3a1d46c409597f1e67a275211a1da414fc7c7..6ef97ee84336ac9a0e192214
> > 5f3d418436d709f4 100644
> > > --- a/gcc/tree-vectorizer.h
> > > +++ b/gcc/tree-vectorizer.h
> > > @@ -1998,6 +1998,19 @@ dr_target_alignment (dr_vec_info *dr_info)
> > >  }
> > >  #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR)
> > >
> > > +/* Return if the stmt_vec_info requires peeling for alignment.  */
> > > +inline bool
> > > +dr_peeling_alignment (stmt_vec_info stmt_info)
> > > +{
> > > +  dr_vec_info *dr_info;
> > > +  if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > +    dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info));
> > 
> > ... but checking it on the first only.
> 
> I did it that way because I was under the assumption that group loads could 
> be relaxed
> to e.g. element wise or some other form.  If it's the case that the group 
> cannot grow or
> be changed I could instead set it only on the first access and then not need 
> to check it
> elsewhere if you prefer.

I've merely noted the discrepancy - consider

  if (a[2*i+1])
    early break;
  ... = a[2*i];

then you'd set ->needs_peeling on the 2nd group member but
dr_peeling_alignment would always check the first.  So yes, I think
we always want to set the flag on the first element of a grouped
access.  We're no longer splitting groups when loop vectorizing.

Richard.

> 
> Thanks,
> Tamar
> > 
> > > +  else
> > > +    dr_info = STMT_VINFO_DR_INFO (stmt_info);
> > > +
> > > +  return dr_info->need_peeling_for_alignment;
> > > +}
> > > +
> > >  inline void
> > >  set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val)
> > >  {
> > >
> > 
> > --
> > Richard Biener <rguent...@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

RE: [PATCH]middle-end: delay checking for alignment to load [PR118464]

Reply via email to