> -----Original Message----- > From: Richard Biener <rguent...@suse.de> > Sent: Wednesday, February 12, 2025 2:58 PM > To: Tamar Christina <tamar.christ...@arm.com> > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com> > Subject: Re: [PATCH v2]middle-end: delay checking for alignment to load > [PR118464] > > On Tue, 11 Feb 2025, Tamar Christina wrote: > > > Hi All, > > > > This fixes two PRs on Early break vectorization by delaying the safety > > checks to > > vectorizable_load when the VF, VMAT and vectype are all known. > > > > This patch does add two new restrictions: > > > > 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven > > group sizes, as they are unaligned every n % 2 iterations and so may > > cross > > a page unwittingly. > > > > 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization > if > > we cannot peel for alignment, as the alignment requirement is quite > > large at > > GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so we > > don't support it for now. > > > > There are other steps documented inside the code itself so that the > > reasoning > > is next to the code. > > > > Note that for VLA I have still left this fully disabled when not working on > > a > > fixed buffer. > > > > For VLA targets like SVE return element alignment as the desired vector > > alignment. This means that the loads are never misaligned and so annoying > > it > > won't ever need to peel. > > > > So what I think needs to happen in GCC 16 is that. > > > > 1. during vect_compute_data_ref_alignment we need to take the max of > > POLY_VALUE_MIN and vector_alignment. > > > > 2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard > > add a > > check that ncopies * vectype does not exceed POLY_VALUE_MAX which we use > as a > > proxy for pagesize. > > > > 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in > > vect_determine_partial_vectors_and_peeling since the first iteration has > > to > > be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail to > > vectorize. > > > > 4. Create a default mask to be used, so that > vect_use_loop_mask_for_alignment_p > > becomes true and we generate the peeled check through loop control for > > partial loops. From what I can tell this won't work for > > LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling > support at > > all in the compiler. That would need to be done independently from the > > above. > > We basically need to implement peeling/versioning for alignment based > on the actual POLY value with the fallback being first-fault loads. > > > In any case, not GCC 15 material so I've kept the WIP patches I have > downstream. > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > > -m32, -m64 and no issues. > > > > Ok for master? > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > PR tree-optimization/118464 > > PR tree-optimization/116855 > > * doc/invoke.texi (min-pagesize): Update docs with vectorizer use. > > * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay > > checks. > > (vect_compute_data_ref_alignment): Remove alignment checks and move > to > > get_load_store_type, increase group access alignment. > > (vect_enhance_data_refs_alignment): Add note to comment needing > > investigating. > > (vect_analyze_data_refs_alignment): Likewise. > > (vect_supportable_dr_alignment): For group loads look at first DR. > > * tree-vect-stmts.cc (get_load_store_type): > > Perform safety checks for early break pfa. > > * tree-vectorizer.h (dr_peeling_alignment, > > dr_set_peeling_alignment): New. > > > > gcc/testsuite/ChangeLog: > > > > PR tree-optimization/118464 > > PR tree-optimization/116855 > > * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the > > load type is relaxed later. > > * gcc.dg/vect/vect-early-break_121-pr114081.c: Update. > > * gcc.dg/vect/vect-early-break_22.c: Reject for load_lanes targets > > * g++.dg/vect/vect-early-break_7-pr118464.cc: New test. > > * gcc.dg/vect/vect-early-break_132-pr118464.c: New test. > > * gcc.dg/vect/vect-early-break_133_pfa1.c: New test. > > * gcc.dg/vect/vect-early-break_133_pfa10.c: New test. > > * gcc.dg/vect/vect-early-break_133_pfa2.c: New test. > > * gcc.dg/vect/vect-early-break_133_pfa3.c: New test. > > * gcc.dg/vect/vect-early-break_133_pfa4.c: New test. > > * gcc.dg/vect/vect-early-break_133_pfa5.c: New test. > > * gcc.dg/vect/vect-early-break_133_pfa6.c: New test. > > * gcc.dg/vect/vect-early-break_133_pfa7.c: New test. > > * gcc.dg/vect/vect-early-break_133_pfa8.c: New test. > > * gcc.dg/vect/vect-early-break_133_pfa9.c: New test. > > * g++.dg/ext/pragma-unroll-lambda-lto.C: Add pragma novector. > > * gcc.dg/tree-ssa/gen-vect-2.c: Likewise. > > * gcc.dg/tree-ssa/gen-vect-25.c: Likewise. > > * gcc.dg/tree-ssa/gen-vect-32.c: Likewise. > > * gcc.dg/tree-ssa/ivopt_mult_2g.c: Likewise. > > * gcc.dg/tree-ssa/ivopts-5.c: Likewise. > > * gcc.dg/tree-ssa/ivopts-6.c: Likewise. > > * gcc.dg/tree-ssa/ivopts-7.c: Likewise. > > * gcc.dg/tree-ssa/ivopts-8.c: Likewise. > > * gcc.dg/tree-ssa/ivopts-9.c: Likewise. > > * gcc.dg/tree-ssa/predcom-dse-1.c: Likewise. > > * gcc.dg/tree-ssa/predcom-dse-10.c: Likewise. > > * gcc.dg/tree-ssa/predcom-dse-11.c: Likewise. > > * gcc.dg/tree-ssa/predcom-dse-12.c: Likewise. > > * gcc.dg/tree-ssa/predcom-dse-2.c: Likewise. > > * gcc.dg/tree-ssa/predcom-dse-3.c: Likewise. > > * gcc.dg/tree-ssa/predcom-dse-4.c: Likewise. > > * gcc.dg/tree-ssa/predcom-dse-5.c: Likewise. > > * gcc.dg/tree-ssa/predcom-dse-6.c: Likewise. > > * gcc.dg/tree-ssa/predcom-dse-7.c: Likewise. > > * gcc.dg/tree-ssa/predcom-dse-8.c: Likewise. > > * gcc.dg/tree-ssa/predcom-dse-9.c: Likewise. > > * gcc.target/i386/pr90178.c: Likewise. > > * gcc.dg/vect/vect-early-break_39.c: Update testcase for misalignment. > > > > --- > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > > index > 0aef2abf05b9b2f5996de69d5ebc3a21109ee6e1..db00f8b403814b58261849d > 8917863dc06bbf3e2 100644 > > --- a/gcc/doc/invoke.texi > > +++ b/gcc/doc/invoke.texi > > @@ -17256,7 +17256,7 @@ Maximum number of relations the oracle will > register in a basic block. > > Work bound when discovering transitive relations from existing relations. > > > > @item min-pagesize > > -Minimum page size for warning purposes. > > +Minimum page size for warning and early break vectorization purposes. > > > > @item openacc-kernels > > Specify mode of OpenACC `kernels' constructs handling. > > diff --git a/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C > b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C > > index > 0db57c8d3a01985e1e76bb9f8a52613179060f19..5980bf316899553e16d078d > eee32911f31fafd94 100644 > > --- a/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C > > +++ b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C > > @@ -10,6 +10,7 @@ inline Iter > > my_find(Iter first, Iter last, Pred pred) > > { > > #pragma GCC unroll 4 > > +#pragma GCC novector > > while (first != last && !pred(*first)) > > ++first; > > return first; > > diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc > b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..5e50e56ad17515e278c05c > 92263af120c3ab2c21 > > --- /dev/null > > +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_7-pr118464.cc > > @@ -0,0 +1,23 @@ > > +/* { dg-do compile } */ > > +/* { dg-add-options vect_early_break } */ > > +/* { dg-require-effective-target vect_early_break } */ > > +/* { dg-require-effective-target vect_int } */ > > +/* { dg-additional-options "-O3" } */ > > + > > +#include <cstddef> > > + > > +struct ts1 { > > + int spans[6][2]; > > +}; > > +struct gg { > > + int t[6]; > > +}; > > +ts1 f(size_t t, struct ts1 *s1, struct gg *s2) { > > + ts1 ret; > > + for (size_t i = 0; i != t; i++) { > > + if (!(i < t)) __builtin_abort(); > > + ret.spans[i][0] = s1->spans[i][0] + s2->t[i]; > > + ret.spans[i][1] = s1->spans[i][1] + s2->t[i]; > > + } > > + return ret; > > +} > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c > b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c > > index > a35999a172ac762bb4873d10b331301750f4015b..00fc8f01991cc994737bc20 > 88e72d85f249bf341 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c > > @@ -29,6 +29,7 @@ int main () > > } > > > > /* check results: */ > > +#pragma GCC novector > > for (i = 0; i < N; i++) > > { > > if (ca[i] != cb[i]) > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c > b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c > > index > 9f14a54c413757df7230b7b6053c83a8a5a1e6c9..99d5e6231ff053089782b52d > c6ce9b9ccb8c64a0 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-25.c > > @@ -27,6 +27,7 @@ int main_1 (int n, int *p) > > } > > > > /* check results: */ > > +#pragma GCC novector > > for (i = 0; i < N; i++) > > { > > if (ia[i] != n) > > @@ -40,6 +41,7 @@ int main_1 (int n, int *p) > > } > > > > /* check results: */ > > +#pragma GCC novector > > for (i = 0; i < N; i++) > > { > > if (ib[i] != k) > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c > b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c > > index > 62d2b5049fd902047540b90a2ef79b789f903969..1202ec326c7e0020daf58af9 > 544cdbe2b1da4914 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c > > @@ -23,6 +23,7 @@ int main () > > } > > > > /* check results: */ > > +#pragma GCC novector > > for (i = 0; i < N; i++) > > { > > if (s.ca[i] != 5) > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c > b/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c > > index > dd06e598f7f48e1a75eba41d626860404325259d..b79bd10585f501992c93648 > ea1a1f2d2699c07c1 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c > > @@ -1,5 +1,5 @@ > > /* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */ > > -/* { dg-options "-O2 -fgimple -m64 -fdump-tree-ivopts-details" } */ > > +/* { dg-options "-O2 -fgimple -m64 -fdump-tree-ivopts-details -fno-tree- > vectorize" } */ > > > > /* Exit tests 'i < N1' and 'p2 > p_limit2' can be replaced, so > > * two ivs i and p2 can be eliminate. */ > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c > > b/gcc/testsuite/gcc.dg/tree- > ssa/ivopts-5.c > > index > a6af497f4bf7f1ef6c64e09b87931225287d78e0..7b9615f07f3c4af3657eb7d01 > 83c1a51de9fbc42 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-5.c > > @@ -5,6 +5,7 @@ int* > > foo (int* mem, int sz, int val) > > { > > int i; > > +#pragma GCC novector > > for (i = 0; i < sz; i++) > > if (mem[i] == val) > > return &mem[i]; > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c > > b/gcc/testsuite/gcc.dg/tree- > ssa/ivopts-6.c > > index > 8383154f99f2559873ef5b3a8fa8119cf679782f..08304293140a82e5484c8399 > b4374a474c66b34b 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-6.c > > @@ -5,6 +5,7 @@ int* > > foo (int* mem, int sz, int val) > > { > > int i; > > +#pragma GCC novector > > for (i = 0; i != sz; i++) > > if (mem[i] == val) > > return &mem[i]; > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c > > b/gcc/testsuite/gcc.dg/tree- > ssa/ivopts-7.c > > index > 44f5603d4f5b8da6c759e8732503638131b0fca8..03160f234f74319cda6d7450 > 788da871ea0cea74 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-7.c > > @@ -5,6 +5,7 @@ int* > > foo (int* mem, int beg, int end, int val) > > { > > int i; > > +#pragma GCC novector > > for (i = beg; i < end; i++) > > if (mem[i] == val) > > return &mem[i]; > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-8.c > > b/gcc/testsuite/gcc.dg/tree- > ssa/ivopts-8.c > > index > b2556eaac0d02f65a50bbd532a47fef9c0b1dfa8..a7fd3c9de3746c116dfb73419 > 805fd7ce6e69ffa 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-8.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-8.c > > @@ -5,6 +5,7 @@ int* > > foo (int* mem, char sz, int val) > > { > > char i; > > +#pragma GCC novector > > for (i = 0; i < sz; i++) > > if (mem[i] == val) > > return &mem[i]; > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-9.c > > b/gcc/testsuite/gcc.dg/tree- > ssa/ivopts-9.c > > index > d26d994f9bd28bc2346a6878d48b159729851ef6..fb9656b88d7bea8a9a84e2c > a6ff877a2aac7e05b 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-9.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-9.c > > @@ -5,6 +5,7 @@ int* > > foo (int* mem, unsigned char sz, int val) > > { > > unsigned char i; > > +#pragma GCC novector > > for (i = 0; i < sz; i++) > > if (mem[i] == val) > > return &mem[i]; > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c > > index > a0a04a08c61d48128ad5fd1a11daaf0abc783053..b660f9d258423356a4d73d5 > 996a5f1a8ede9ead9 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c > > @@ -32,6 +32,7 @@ void check (int *a, int *res, int len) > > { > > int i; > > > > +#pragma GCC novector > > for (i = 0; i < len; i++) > > if (a[i] != res[i]) > > abort (); > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c > > index > f770a8ad812aedee8f65b011134cda91cbe2bf91..8e5a3a434986a31bb635bf3b > c1ecc36d463f2ee7 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-10.c > > @@ -23,6 +23,7 @@ void check (int *a, int *res, int len) > > { > > int i; > > > > +#pragma GCC novector > > for (i = 0; i < len; i++) > > if (a[i] != res[i]) > > abort (); > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c > > index > ed2b96a0d1a4e0c90bf52a83b5f21e2fd1c5a5c5..fd56fd9747e3c572c93107188 > ede7482ad01bb99 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-11.c > > @@ -29,6 +29,7 @@ void check (int *a, int *res, int len, int sum, int val) > > if (sum != val) > > abort (); > > > > +#pragma GCC novector > > for (i = 0; i < len; i++) > > if (a[i] != res[i]) > > abort (); > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c > > index > 2487c1c8205a4f09fd16974f3599ddc8c48b92cf..5eac905aff87e6c4aa4449c689 > d2594b240fec4e 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-12.c > > @@ -37,6 +37,7 @@ void check (int *a, int *res, int len, int sval) > > if (sum != sval) > > abort (); > > > > +#pragma GCC novector > > for (i = 0; i < len; i++) > > if (a[i] != res[i]) > > abort (); > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c > > index > 020ca705790d6ace707184c9d2804f3d690de916..801acad33e9d6b7eb17f0cde > 408903c4f2674acc 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c > > @@ -32,6 +32,7 @@ void check (int *a, int *res, int len) > > { > > int i; > > > > +#pragma GCC novector > > for (i = 0; i < len; i++) > > if (a[i] != res[i]) > > abort (); > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c > > index > 667cc333d9f2c030474e0b3115c0b86cda733c2e..8b82bdbc0c92cc579824393d > c15f2f5a3e5f55e5 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c > > @@ -40,6 +40,7 @@ void check (int *a, int *res, int len) > > { > > int i; > > > > +#pragma GCC novector > > for (i = 0; i < len; i++) > > if (a[i] != res[i]) > > abort (); > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c > > index > 8118461af0b63d1f9b42879783ae2650a9d9b34a..0d64bc72f82341fd0518a6f5 > 9ad2a10aec7b0088 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c > > @@ -31,6 +31,7 @@ void check (int *a, int *res, int len) > > { > > int i; > > > > +#pragma GCC novector > > for (i = 0; i < len; i++) > > if (a[i] != res[i]) > > abort (); > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c > > index > 03fa646661e2839946e80e0b27ea1d0ea0ef9aeb..7db3bca3b2df98f3c0b3db00 > be18fc8054644655 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-5.c > > @@ -33,6 +33,7 @@ void check (int *a, int *res, int len) > > { > > int i; > > > > +#pragma GCC novector > > for (i = 0; i < len; i++) > > if (a[i] != res[i]) > > abort (); > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c > > index > ab2fd403d3005ba06d9992580945ce28f8fb1c09..1267bae5f1c44d60d484cca7 > d88a5714770f147f 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-6.c > > @@ -35,6 +35,7 @@ void check (int *a, int *res, int len) > > { > > int i; > > > > +#pragma GCC novector > > for (i = 0; i < len; i++) > > if (a[i] != res[i]) > > abort (); > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c > > index > c746ebd715561eb9f7192a433c321f86e0751eaa..cfe44a06ce4ada6fddc3659dd > f748a16904b5d9e 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-7.c > > @@ -33,6 +33,7 @@ void check (int *a, int *res, int len) > > { > > int i; > > > > +#pragma GCC novector > > for (i = 0; i < len; i++) > > if (a[i] != res[i]) > > abort (); > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c > > index > 6c4e9afa487ed33e4ab5d887640e0efa44a72c6d..646e43d9aad2b235bdae0d9d > 52df89a3da2dd3e4 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-8.c > > @@ -31,6 +31,7 @@ void check (int *a, int *res, int len) > > { > > int i; > > > > +#pragma GCC novector > > for (i = 0; i < len; i++) > > if (a[i] != res[i]) > > abort (); > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c > b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c > > index > 9c5e8ca9a793b0405e7f448798aa1fac483d2f05..30daf82fac5cef2e26e4597aa4 > eb10aa33cd0af2 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-9.c > > @@ -69,6 +69,7 @@ void check (int *a, int *res, int len) > > { > > int i; > > > > +#pragma GCC novector > > for (i = 0; i < len; i++) > > if (a[i] != res[i]) > > abort (); > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c > b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c > > index > 9ef1330b47c817e16baaafa44c2b15108b9dd3a9..4c8255895b976653228233d > 93c950629f3231554 100644 > > --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c > > @@ -55,7 +55,9 @@ int main() > > } > > } > > rephase (); > > +#pragma GCC novector > > for (i = 0; i < 32; ++i) > > +#pragma GCC novector > > for (j = 0; j < 3; ++j) > > #pragma GCC novector > > for (k = 0; k < 3; ++k) > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c > > index > 423ff0b566b18bf04ce4f67a45b94dc1a021a4a0..5464d1d56fe97542a2dfc7afba > 39aabc0468737c 100644 > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c > > @@ -5,7 +5,8 @@ > > /* { dg-additional-options "-O3" } */ > > /* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */ > > > > -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > +/* Arm and -m32 create a group size of 3 here, which we can't support yet. > > */ > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { { ! > > arm*-*-* > } || { { x86_64-*-* i?86-*-* } && ilp64 } } } } } */ > > > > typedef struct filter_list_entry { > > const char *name; > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..9bf0cbc8853f74de550e8a > c83ab569fc9fbde126 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_132-pr118464.c > > @@ -0,0 +1,25 @@ > > +/* { dg-do compile } */ > > +/* { dg-add-options vect_early_break } */ > > +/* { dg-require-effective-target vect_early_break } */ > > +/* { dg-require-effective-target vect_int } */ > > +/* { dg-additional-options "-O3" } */ > > + > > +int a, b, c, d, e, f; > > +short g[1]; > > +int main() { > > + int h; > > + while (a) { > > + while (h) > > + ; > > + for (b = 2; b; b--) { > > + while (c) > > + ; > > + f = g[a]; > > + if (d) > > + break; > > + } > > + while (e) > > + ; > > + } > > + return 0; > > +} > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..dc771186efafe25bb65490 > da7a383ad7f6ceb0a7 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa1.c > > @@ -0,0 +1,19 @@ > > +/* { dg-do compile } */ > > +/* { dg-add-options vect_early_break } */ > > +/* { dg-require-effective-target vect_early_break } */ > > +/* { dg-require-effective-target vect_int } */ > > +/* { dg-additional-options "-O3" } */ > > + > > +char string[1020]; > > + > > +char * find(int n, char c) > > +{ > > + for (int i = 1; i < n; i++) { > > + if (string[i] == c) > > + return &string[i]; > > + } > > + return 0; > > +} > > + > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" > > "vect" > } } */ > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..82d473a279ce060c55028 > 9c61729d9f9b56f0d2a > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa10.c > > @@ -0,0 +1,24 @@ > > +/* { dg-add-options vect_early_break } */ > > +/* { dg-do compile } */ > > +/* { dg-require-effective-target vect_early_break } */ > > +/* { dg-require-effective-target vect_int } */ > > + > > +/* { dg-additional-options "-Ofast" } */ > > + > > +/* Alignment requirement too big, load lanes targets can't safely > > vectorize this. > */ > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! > vect_load_lanes } } } } */ > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using > > peeling" > "vect" { target { ! vect_load_lanes } } } } */ > > + > > +unsigned test4(char x, char *restrict vect_a, char *restrict vect_b, int n) > > +{ > > + unsigned ret = 0; > > + for (int i = 0; i < (n - 2); i+=2) > > + { > > + if (vect_a[i] > x || vect_a[i+2] > x) > > + return 1; > > + > > + vect_b[i] = x; > > + vect_b[i+1] = x+1; > > + } > > + return ret; > > +} > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..7d56772fbf380ce42ac758 > ca29a5f3f9d3f6e0d1 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa2.c > > @@ -0,0 +1,19 @@ > > +/* { dg-do compile } */ > > +/* { dg-add-options vect_early_break } */ > > +/* { dg-require-effective-target vect_early_break } */ > > +/* { dg-require-effective-target vect_int } */ > > +/* { dg-additional-options "-O3" } */ > > + > > +char string[1020]; > > + > > +char * find(int n, char c) > > +{ > > + for (int i = 0; i < n; i++) { > > + if (string[i] == c) > > + return &string[i]; > > + } > > + return 0; > > +} > > + > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using > > peeling" > "vect" } } */ > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..374a051b945e97eedb9be > 9da423cf54b5e564d6f > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa3.c > > @@ -0,0 +1,20 @@ > > +/* { dg-do compile } */ > > +/* { dg-add-options vect_early_break } */ > > +/* { dg-require-effective-target vect_early_break } */ > > +/* { dg-require-effective-target vect_int } */ > > +/* { dg-additional-options "-O3" } */ > > + > > +char string[1020] __attribute__((aligned(1))); > > + > > +char * find(int n, char c) > > +{ > > + for (int i = 1; i < n; i++) { > > + if (string[i] == c) > > + return &string[i]; > > + } > > + return 0; > > +} > > + > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" > > "vect" > } } */ > > +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */ > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..297fb7e9b9beffa25ab8f25 > 7ceea1c065fcc6ae9 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa4.c > > @@ -0,0 +1,20 @@ > > +/* { dg-do compile } */ > > +/* { dg-add-options vect_early_break } */ > > +/* { dg-require-effective-target vect_early_break } */ > > +/* { dg-require-effective-target vect_int } */ > > +/* { dg-additional-options "-O3" } */ > > + > > +char string[1020] __attribute__((aligned(1))); > > + > > +char * find(int n, char c) > > +{ > > + for (int i = 0; i < n; i++) { > > + if (string[i] == c) > > + return &string[i]; > > + } > > + return 0; > > +} > > + > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using > > peeling" > "vect" } } */ > > +/* { dg-final { scan-tree-dump "force alignment of string" "vect" } } */ > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..ca95be44e92e32769da1d > 1e9b740ae54682a3d55 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa5.c > > @@ -0,0 +1,23 @@ > > +/* { dg-add-options vect_early_break } */ > > +/* { dg-do compile } */ > > +/* { dg-require-effective-target vect_early_break } */ > > +/* { dg-require-effective-target vect_int } */ > > + > > +/* { dg-additional-options "-Ofast" } */ > > + > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > + > > +unsigned test4(char x, char *vect, int n) > > +{ > > + unsigned ret = 0; > > + for (int i = 0; i < n; i++) > > + { > > + if (vect[i] > x) > > + return 1; > > + > > + vect[i] = x; > > + } > > + return ret; > > +} > > + > > +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" > > "vect" > } } */ > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..ee123df6ed2ba97e92307c > 64a61c97b1b6268743 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c > > @@ -0,0 +1,23 @@ > > +/* { dg-add-options vect_early_break } */ > > +/* { dg-do compile } */ > > +/* { dg-require-effective-target vect_early_break } */ > > +/* { dg-require-effective-target vect_int } */ > > + > > +/* { dg-additional-options "-Ofast" } */ > > + > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > + > > +unsigned test4(char x, char *vect_a, char *vect_b, int n) > > +{ > > + unsigned ret = 0; > > + for (int i = 1; i < n; i++) > > + { > > + if (vect_a[i] > x || vect_b[i] > x) > > + return 1; > > + > > + vect_a[i] = x; > > + } > > + return ret; > > +} > > + > > +/* { dg-final { scan-tree-dump "Versioning for alignment will be applied" > > "vect" } > } */ > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..51bad4e745b67cfdaad20f > 50776299531824ce9c > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa7.c > > @@ -0,0 +1,23 @@ > > +/* { dg-add-options vect_early_break } */ > > +/* { dg-do compile } */ > > +/* { dg-require-effective-target vect_early_break } */ > > +/* { dg-require-effective-target vect_int } */ > > + > > +/* { dg-additional-options "-Ofast" } */ > > + > > +/* This should be vectorizable through load_lanes and linear targets. */ > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > + > > +unsigned test4(char x, char * restrict vect_a, char * restrict vect_b, int > > n) > > +{ > > + unsigned ret = 0; > > + for (int i = 0; i < n; i+=2) > > + { > > + if (vect_a[i] > x || vect_a[i+1] > x) > > + return 1; > > + > > + vect_b[i] = x; > > + vect_b[i+1] = x+1; > > + } > > + return ret; > > +} > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..dbb14ba3239c91b9bfdf56 > cecc60750394e10f2b > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa8.c > > @@ -0,0 +1,25 @@ > > +/* { dg-add-options vect_early_break } */ > > +/* { dg-do compile } */ > > +/* { dg-require-effective-target vect_early_break } */ > > +/* { dg-require-effective-target vect_int } */ > > + > > +/* { dg-additional-options "-Ofast" } */ > > + > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > + > > +char vect_a[1025]; > > +char vect_b[1025]; > > + > > +unsigned test4(char x, int n) > > +{ > > + unsigned ret = 0; > > + for (int i = 1; i < (n - 2); i+=2) > > + { > > + if (vect_a[i] > x || vect_a[i+1] > x) > > + return 1; > > + > > + vect_b[i] = x; > > + vect_b[i+1] = x+1; > > + } > > + return ret; > > +} > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..31e209620925353948325 > 3efc17499a53d112894 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa9.c > > @@ -0,0 +1,28 @@ > > +/* { dg-add-options vect_early_break } */ > > +/* { dg-do compile } */ > > +/* { dg-require-effective-target vect_early_break } */ > > +/* { dg-require-effective-target vect_int } */ > > + > > +/* { dg-additional-options "-Ofast" } */ > > + > > +/* Group size is uneven, load lanes targets can't safely vectorize this. > > */ > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using > > peeling" > "vect" } } */ > > + > > + > > +char vect_a[1025]; > > +char vect_b[1025]; > > + > > +unsigned test4(char x, int n) > > +{ > > + unsigned ret = 0; > > + for (int i = 1; i < (n - 2); i+=2) > > + { > > + if (vect_a[i-1] > x || vect_a[i+2] > x) > > + return 1; > > + > > + vect_b[i] = x; > > + vect_b[i+1] = x+1; > > + } > > + return ret; > > +} > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c > > index > 9d3c6a5dffe3be4a7759b150e330d18144ab5ce5..b3f40b8c9ba49e41bd283e46 > a462238c3b5825ef 100644 > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c > > @@ -23,4 +23,5 @@ unsigned test4(unsigned x, unsigned n) > > return ret; > > } > > > > -/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } > > */ > > +/* cannot safely vectorize this due due to the group misalignment. */ > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 > > "vect" } } > */ > > diff --git a/gcc/testsuite/gcc.target/i386/pr90178.c > b/gcc/testsuite/gcc.target/i386/pr90178.c > > index > 1df36af0541c01f3624fe51efbc8cfa0ec67fe60..e9fea04fb148ed53c1ac9b2c6ed7 > 3e85ba982b42 100644 > > --- a/gcc/testsuite/gcc.target/i386/pr90178.c > > +++ b/gcc/testsuite/gcc.target/i386/pr90178.c > > @@ -4,6 +4,7 @@ > > int* > > find_ptr (int* mem, int sz, int val) > > { > > +#pragma GCC novector > > for (int i = 0; i < sz; i++) > > if (mem[i] == val) > > return &mem[i]; > > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc > > index > 6d5854ac7c7a18e09ec7ad72c534abdc55cb6efa..c85df96685f64f9814251f2d4f > dbcc5973f2b513 100644 > > --- a/gcc/tree-vect-data-refs.cc > > +++ b/gcc/tree-vect-data-refs.cc > > @@ -731,7 +731,8 @@ vect_analyze_early_break_dependences (loop_vec_info > loop_vinfo) > > if (is_gimple_debug (stmt)) > > continue; > > > > - stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt); > > + stmt_vec_info stmt_vinfo > > + = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (stmt)); > > auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo); > > if (!dr_ref) > > continue; > > @@ -748,26 +749,14 @@ vect_analyze_early_break_dependences > (loop_vec_info loop_vinfo) > > bounded by VF so accesses are within range. We only need to check > > the reads since writes are moved to a safe place where if we get > > there we know they are safe to perform. */ > > - if (DR_IS_READ (dr_ref) > > - && !ref_within_array_bound (stmt, DR_REF (dr_ref))) > > + if (DR_IS_READ (dr_ref)) > > { > > - if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo) > > - || STMT_VINFO_STRIDED_P (stmt_vinfo)) > > - { > > - const char *msg > > - = "early break not supported: cannot peel " > > - "for alignment, vectorization would read out of " > > - "bounds at %G"; > > - return opt_result::failure_at (stmt, msg, stmt); > > - } > > - > > - dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo); > > - dr_info->need_peeling_for_alignment = true; > > + dr_set_peeling_alignment (stmt_vinfo, true); > > > > if (dump_enabled_p ()) > > dump_printf_loc (MSG_NOTE, vect_location, > > - "marking DR (read) as needing peeling for " > > - "alignment at %G", stmt); > > + "marking DR (read) as possibly needing peeling > > " > > + "for alignment at %G", stmt); > > } > > > > if (DR_IS_READ (dr_ref)) > > @@ -1326,9 +1315,6 @@ vect_record_base_alignments (vec_info *vinfo) > > Compute the misalignment of the data reference DR_INFO when vectorizing > > with VECTYPE. > > > > - RESULT is non-NULL iff VINFO is a loop_vec_info. In that case, *RESULT > > will > > - be set appropriately on failure (but is otherwise left unchanged). > > - > > Output: > > 1. initialized misalignment info for DR_INFO > > > > @@ -1337,7 +1323,7 @@ vect_record_base_alignments (vec_info *vinfo) > > > > static void > > vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info, > > - tree vectype, opt_result *result = nullptr) > > + tree vectype) > > { > > stmt_vec_info stmt_info = dr_info->stmt; > > vec_base_alignments *base_alignments = &vinfo->base_alignments; > > @@ -1365,63 +1351,20 @@ vect_compute_data_ref_alignment (vec_info > *vinfo, dr_vec_info *dr_info, > > = exact_div (targetm.vectorize.preferred_vector_alignment (vectype), > > BITS_PER_UNIT); > > > > - /* If this DR needs peeling for alignment for correctness, we must > > - ensure the target alignment is a constant power-of-two multiple of the > > - amount read per vector iteration (overriding the above hook where > > - necessary). */ > > - if (dr_info->need_peeling_for_alignment) > > + /* If we have a grouped access we require that the alignment be VF * > > elem. */ > > + if (loop_vinfo > > + && dr_peeling_alignment (stmt_info) > > + && STMT_VINFO_GROUPED_ACCESS (stmt_info)) > > { > > - /* Vector size in bytes. */ > > - poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT > > (vectype)); > > - > > - /* We can only peel for loops, of course. */ > > - gcc_checking_assert (loop_vinfo); > > - > > - /* Calculate the number of vectors read per vector iteration. If > > - it is a power of two, multiply through to get the required > > - alignment in bytes. Otherwise, fail analysis since alignment > > - peeling wouldn't work in such a case. */ > > - poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > > - if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) > > - num_scalars *= DR_GROUP_SIZE (stmt_info); > > - > > - auto num_vectors = vect_get_num_vectors (num_scalars, vectype); > > - if (!pow2p_hwi (num_vectors)) > > - { > > - *result = opt_result::failure_at (vect_location, > > - "non-power-of-two num vectors %u " > > - "for DR needing peeling for " > > - "alignment at %G", > > - num_vectors, stmt_info->stmt); > > - return; > > - } > > - > > - safe_align *= num_vectors; > > - if (maybe_gt (safe_align, 4096U)) > > - { > > - pretty_printer pp; > > - pp_wide_integer (&pp, safe_align); > > - *result = opt_result::failure_at (vect_location, > > - "alignment required for correctness" > > - " (%s) may exceed page size", > > - pp_formatted_text (&pp)); > > - return; > > - } > > - > > - unsigned HOST_WIDE_INT multiple; > > - if (!constant_multiple_p (vector_alignment, safe_align, &multiple) > > - || !pow2p_hwi (multiple)) > > + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > > + vector_alignment > > + = vf * TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype))); > > I think we discussed this before, also when introducing peeling > for alignment support. This is incorrect for grouped accesses where > the number of scalar elements accessed is GROUP_SIZE * vf, so you > miss a multiplication by GROUP_SIZE here.
Huh, But doesn't your VF already contain your group size? If I have an LD4 on V4SI My VF is 16 isn't it? Because I still handled 16 elements per iteration. So why would I need 4 * 16? > > Note that this (and also your VF * element_size) can result in a > non-power-of-two value. > > That said, I'm quite sure we don't want to have a dr->target_alignment > that isn't power-of-two, so if the comput doesn't end up with a > power-of-two value we should leave it as the target prefers and > fixup (or fail) during vectorizable_load. Ack I'll round up to power of 2. > > > + if (dump_enabled_p ()) > > { > > - if (dump_enabled_p ()) > > - { > > - dump_printf_loc (MSG_NOTE, vect_location, > > - "forcing alignment for DR from preferred ("); > > - dump_dec (MSG_NOTE, vector_alignment); > > - dump_printf (MSG_NOTE, ") to safe align ("); > > - dump_dec (MSG_NOTE, safe_align); > > - dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt); > > - } > > - vector_alignment = safe_align; > > + dump_printf_loc (MSG_NOTE, vect_location, > > + "alignment increased due to early break to "); > > + dump_dec (MSG_NOTE, vector_alignment); > > + dump_printf (MSG_NOTE, " bytes.\n"); > > } > > } > > > > @@ -2487,6 +2430,8 @@ vect_enhance_data_refs_alignment (loop_vec_info > loop_vinfo) > > || !slpeel_can_duplicate_loop_p (loop, LOOP_VINFO_IV_EXIT > > (loop_vinfo), > > loop_preheader_edge (loop)) > > || loop->inner > > + /* We don't currently maintaing the LCSSA for prologue peeled > > inversed > > + loops. */ > > || LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo)) > > do_peeling = false; > > > > @@ -2950,12 +2895,9 @@ vect_analyze_data_refs_alignment (loop_vec_info > loop_vinfo) > > if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt) > > && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info->stmt) > > continue; > > - opt_result res = opt_result::success (); > > + > > vect_compute_data_ref_alignment (loop_vinfo, dr_info, > > - STMT_VINFO_VECTYPE (dr_info->stmt), > > - &res); > > - if (!res) > > - return res; > > + STMT_VINFO_VECTYPE (dr_info->stmt)); > > } > > } > > > > @@ -7219,7 +7161,7 @@ vect_supportable_dr_alignment (vec_info *vinfo, > dr_vec_info *dr_info, > > > > if (misalignment == 0) > > return dr_aligned; > > - else if (dr_info->need_peeling_for_alignment) > > + else if (dr_peeling_alignment (stmt_info)) > > return dr_unaligned_unsupported; > > > > /* For now assume all conditional loads/stores support unaligned > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > index > 6bbb16beff2c627fca11a7403ba5ee3a5faa21c1..436d373ae6ec06aff165a7bee3 > 7b3fa1dc95079b 100644 > > --- a/gcc/tree-vect-stmts.cc > > +++ b/gcc/tree-vect-stmts.cc > > @@ -2597,6 +2597,89 @@ get_load_store_type (vec_info *vinfo, > stmt_vec_info stmt_info, > > return false; > > } > > > > + /* If this DR needs peeling for alignment for correctness, we must > > + ensure the target alignment is a constant power-of-two multiple of the > > + amount read per vector iteration (overriding the above hook where > > + necessary). */ > > + if (dr_peeling_alignment (stmt_info)) > > + { > > + /* We can only peel for loops, of course. */ > > + gcc_checking_assert (loop_vinfo); > > + > > + /* Check if we support the operation if early breaks are needed. */ > > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo) > > + && (*memory_access_type == VMAT_GATHER_SCATTER > > + || *memory_access_type == VMAT_STRIDED_SLP)) > > + { > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > + "early break not supported: cannot peel for " > > + "alignment. With non-contiguous memory > vectorization" > > + " could read out of bounds at %G ", > > + STMT_VINFO_STMT (stmt_info)); > > + return false; > > + } > > + > > + /* Even if uneven group sizes are aligned on the first load, the > > second > > + iteration won't be. As such reject uneven group sizes. */ > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info) > > + && (DR_GROUP_SIZE (stmt_info) % 2) == 1) > > Hmm, but a group size of 6 is even, but a vector size of four doesn't > make the 2nd aligned. So we need a power-of-two GROUP_SIZE * VF > and a byte alignment according to that. > Argg true. > > + { > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > + "early break not supported: uneven group size, " > > + "vectorization could read out of bounds at %G ", > > + STMT_VINFO_STMT (stmt_info)); > > + return false; > > + } > > + > > + /* Vector size in bytes. */ > > + poly_uint64 safe_align; > > + if (nunits.is_constant ()) > > + safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype)); > > + else > > + safe_align = estimated_poly_value (LOOP_VINFO_VECT_FACTOR > (loop_vinfo), > > + POLY_VALUE_MAX); > > + > > + auto num_vectors = ncopies; > > + if (!pow2p_hwi (num_vectors)) > > + { > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > + "non-power-of-two num vectors %u " > > + "for DR needing peeling for " > > + "alignment at %G", > > + num_vectors, STMT_VINFO_STMT (stmt_info)); > > + return false; > > + } > > + > > + safe_align *= num_vectors; > > + bool inbounds > > + = ref_within_array_bound (STMT_VINFO_STMT (stmt_info), > > + DR_REF (STMT_VINFO_DATA_REF (stmt_info))); > > I'm again confused why you think ref_within_array_bound can be used to > validize anything? The goal here is that if we have a value that is aligned, and we don't exceed the page And we're able to increase the target alignment that VLA is safe to use. Since we can't peel for SVE any unaligned value can't be handled later. But as I mentioned, To SVE nothing is unaligned, as the target alignment is the element size. For non-SVE unaligned accesses get processed later on. For SVE you just generate wrong code. Since we know we can realign known sized buffers (since we'll, they're .data) then we know we can let it through. I don't really know of a better way to do this. Since again, for VLA the backend never returns that the load is misaligned, so it doesn't stop vectorization. The inbounds is used as a proxy for that. Which is what the comment below was trying to explain. > > > + /* For VLA we have to insert a runtime check that the vector loads > > + per iterations don't exceed a page size. For now we can use > > + POLY_VALUE_MAX as a proxy as we can't peel for VLA. */ > > + if (maybe_gt (safe_align, (unsigned)param_min_pagesize) > > + /* We don't support PFA for VLA at the moment. Some targets like SVE > > + return a target alignment requirement of a single element. For > > + early break this is potentially unsafe so we can't count on > > + alignment rejecting such loops later as it thinks loads are never > > + misaligned. */ > > + || (!nunits.is_constant () && !inbounds)) > > + { > > + if (dump_enabled_p ()) > > + { > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > + "alignment required for correctness ("); > > + dump_dec (MSG_MISSED_OPTIMIZATION, safe_align); > > + dump_printf (MSG_NOTE, ") may exceed page size\n"); > > + } > > + return false; > > + } > > + *alignment_support_scheme = dr_unaligned_supported; > > and the only thing should be *alignment_support_scheme == dr_aligned, > and with a possibly too low taget_alignment even that's not enough. > Fair, but you shouldn't be able to get there with a too low target alignment though. > Can you split out the testsuite part that just adds #pragma GCC novector? > That part is OK. > Ok, Tamar > Thanks, > Richard. > > > + } > > + > > if (*alignment_support_scheme == dr_unaligned_unsupported) > > { > > if (dump_enabled_p ()) > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > > index > b0cb081cba0ae8b11fbfcfcb8c6d440ec451ccb5..aeaf714c155bc2d87bf50e6dba > 0dbfbcca027441 100644 > > --- a/gcc/tree-vectorizer.h > > +++ b/gcc/tree-vectorizer.h > > @@ -1998,6 +1998,33 @@ dr_target_alignment (dr_vec_info *dr_info) > > } > > #define DR_TARGET_ALIGNMENT(DR) dr_target_alignment (DR) > > > > +/* Return if the stmt_vec_info requires peeling for alignment. */ > > +inline bool > > +dr_peeling_alignment (stmt_vec_info stmt_info) > > +{ > > + dr_vec_info *dr_info; > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) > > + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info)); > > + else > > + dr_info = STMT_VINFO_DR_INFO (stmt_info); > > + > > + return dr_info->need_peeling_for_alignment; > > +} > > + > > +/* Set the need_peeling_for_alignment for the the stmt_vec_info, if group > > + access then set on the fist element otherwise set on DR directly. */ > > +inline void > > +dr_set_peeling_alignment (stmt_vec_info stmt_info, bool requires_alignment) > > +{ > > + dr_vec_info *dr_info; > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) > > + dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (stmt_info)); > > + else > > + dr_info = STMT_VINFO_DR_INFO (stmt_info); > > + > > + dr_info->need_peeling_for_alignment = requires_alignment; > > +} > > + > > inline void > > set_dr_target_alignment (dr_vec_info *dr_info, poly_uint64 val) > > { > > > > > > > > > > > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)