> -----Original Message----- > From: Thomas Schwinge <tho...@schwinge.name> > Sent: Monday, January 13, 2025 9:54 AM > To: Tamar Christina <tamar.christ...@arm.com>; Alex Coplan > <alex.cop...@arm.com>; gcc-patches@gcc.gnu.org > Cc: Andrew Stubbs <a...@baylibre.com> > Subject: Re: [gcc r15-6807] vect: Force alignment peeling to vectorize more > early > break loops [PR118211] > > Hi! > > On 2025-01-10T21:22:03+0000, Tamar Christina via Gcc-cvs <gcc- > c...@gcc.gnu.org> wrote: > > https://gcc.gnu.org/g:68326d5d1a593dc0bf098c03aac25916168bc5a9 > > > > commit r15-6807-g68326d5d1a593dc0bf098c03aac25916168bc5a9 > > Author: Alex Coplan <alex.cop...@arm.com> > > Date: Mon Mar 11 13:09:10 2024 +0000 > > > > vect: Force alignment peeling to vectorize more early break loops > > [PR118211] > > In addition to the regression already noted elsewhere: > > PASS: gcc.dg/tree-ssa/predcom-8.c (test for excess errors) > PASS: gcc.dg/tree-ssa/predcom-8.c scan-tree-dump pcom "Executing > predictive > commoning without unrolling" > [-PASS:-]{+FAIL:+} gcc.dg/tree-ssa/predcom-8.c scan-tree-dump-not pcom > "Invalid sum" > > ..., this commit for for '--target=amdgcn-amdhsa' (tested > '-march=gfx908', '-march=gfx1100') also regresses: > > PASS: gcc.dg/vect/vect-switch-search-line-fast.c (test for excess errors) > [-XFAIL:-]{+FAIL:+} gcc.dg/vect/vect-switch-search-line-fast.c > scan-tree-dump- > times vect "vectorized 1 loops" [-1-]{+0+} > > gcc.dg/vect/vect-switch-search-line-fast.c: pattern found 1 times > > > --- a/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c > > +++ b/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c > > [...] > > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail > > *-*-* } } > } */ > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { > > target { ilp32 > } } } } */ > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { > > target { ! > ilp32 } } } } */ > > Presuming that it's correct that GCN continues to be able vectorize this, > what is the appropriate conditional to use? >
I don't think we really have a condition for it's succeeding on some targets for now. The original testcase was xfail but it was failing for many different reasons on all targets. So I think just doing { target { ilp32 || { amdgcn-* } } } should work for now. Thanks, Tamar > > Grüße > Thomas > > > > This allows us to vectorize more loops with early exits by forcing > > peeling for alignment to make sure that we're guaranteed to be able to > > safely read an entire vector iteration without crossing a page boundary. > > > > To make this work for VLA architectures we have to allow compile-time > > non-constant target alignments. We also have to override the result of > > the target's preferred_vector_alignment hook if it isn't a power-of-two > > multiple of the TYPE_SIZE of the chosen vector type. > > > > gcc/ChangeLog: > > > > PR tree-optimization/118211 > > PR tree-optimization/116126 > > * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): > > Set need_peeling_for_alignment flag on read DRs instead of > > failing vectorization. Punt on gathers. > > (dr_misalignment): Handle non-constant target alignments. > > (vect_compute_data_ref_alignment): If need_peeling_for_alignment > > flag is set on the DR, then override the target alignment chosen > > by the preferred_vector_alignment hook to choose a safe > > alignment. > > (vect_supportable_dr_alignment): Override > > support_vector_misalignment hook if need_peeling_for_alignment > > is set on the DR: in this case we must return > > dr_unaligned_unsupported in order to force peeling. > > * tree-vect-loop-manip.cc (vect_do_peeling): Allow prolog > > peeling by a compile-time non-constant amount. > > * tree-vectorizer.h (dr_vec_info): Add new flag > > need_peeling_for_alignment. > > > > gcc/testsuite/ChangeLog: > > > > PR tree-optimization/118211 > > PR tree-optimization/116126 > > * gcc.dg/tree-ssa/cunroll-13.c: Don't vectorize. > > * gcc.dg/tree-ssa/cunroll-14.c: Likewise. > > * gcc.dg/unroll-6.c: Likewise. > > * gcc.dg/tree-ssa/gen-vect-28.c: Likewise. > > * gcc.dg/vect/vect-104.c: Expect to vectorize. > > * gcc.dg/vect/vect-early-break_108-pr113588.c: Likewise. > > * gcc.dg/vect/vect-early-break_109-pr113588.c: Likewise. > > * gcc.dg/vect/vect-early-break_110-pr113467.c: Likewise. > > * gcc.dg/vect/vect-early-break_3.c: Likewise. > > * gcc.dg/vect/vect-early-break_65.c: Likewise. > > * gcc.dg/vect/vect-early-break_8.c: Likewise. > > * gfortran.dg/vect/vect-5.f90: Likewise. > > * gfortran.dg/vect/vect-8.f90: Likewise. > > * gcc.dg/vect/vect-switch-search-line-fast.c: > > > > Co-Authored-By: Tamar Christina <tamar.christ...@arm.com> > > > > Diff: > > --- > > gcc/testsuite/gcc.dg/tree-ssa/cunroll-13.c | 2 +- > > gcc/testsuite/gcc.dg/tree-ssa/cunroll-14.c | 2 +- > > gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c | 1 + > > gcc/testsuite/gcc.dg/unroll-6.c | 2 +- > > gcc/testsuite/gcc.dg/vect/vect-104.c | 1 + > > .../gcc.dg/vect/vect-early-break_108-pr113588.c | 2 +- > > .../gcc.dg/vect/vect-early-break_109-pr113588.c | 2 +- > > .../gcc.dg/vect/vect-early-break_110-pr113467.c | 2 +- > > gcc/testsuite/gcc.dg/vect/vect-early-break_3.c | 2 +- > > gcc/testsuite/gcc.dg/vect/vect-early-break_65.c | 2 +- > > gcc/testsuite/gcc.dg/vect/vect-early-break_8.c | 2 +- > > .../gcc.dg/vect/vect-switch-search-line-fast.c | 3 +- > > gcc/testsuite/gfortran.dg/vect/vect-5.f90 | 1 + > > gcc/testsuite/gfortran.dg/vect/vect-8.f90 | 5 +- > > gcc/tree-vect-data-refs.cc | 113 > > ++++++++++++++++++--- > > gcc/tree-vect-loop-manip.cc | 6 -- > > gcc/tree-vectorizer.h | 5 + > > 17 files changed, 119 insertions(+), 34 deletions(-) > > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-13.c > b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-13.c > > index 98cb56a8564b..154e2963f12d 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-13.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-13.c > > @@ -1,5 +1,5 @@ > > /* { dg-do compile } */ > > -/* { dg-options "-O3 -fgimple -fdump-tree-cunroll-blocks-details" } */ > > +/* { dg-options "-O3 -fgimple -fdump-tree-cunroll-blocks-details -fno-tree- > vectorize" } */ > > > > #if __SIZEOF_INT__ < 4 > > __extension__ typedef __INT32_TYPE__ i32; > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-14.c > b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-14.c > > index 5f112da310c8..4b369f7ad278 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-14.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-14.c > > @@ -1,5 +1,5 @@ > > /* { dg-do compile } */ > > -/* { dg-options "-O3 -fdump-tree-cunroll-blocks-details" } */ > > +/* { dg-options "-O3 -fdump-tree-cunroll-blocks-details > > -fno-tree-vectorize" } */ > > struct a {int a[100];}; > > void > > t(struct a *a) > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c > b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c > > index c5f1b5aff115..5c0ea58a7b00 100644 > > --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c > > @@ -20,6 +20,7 @@ int main_1 (int off) > > } > > > > /* check results: */ > > +#pragma GCC novector > > for (i = 0; i < N; i++) > > { > > if (ia[i+off] != 5) > > diff --git a/gcc/testsuite/gcc.dg/unroll-6.c > > b/gcc/testsuite/gcc.dg/unroll-6.c > > index 7664bbff109f..7be1b7cfadba 100644 > > --- a/gcc/testsuite/gcc.dg/unroll-6.c > > +++ b/gcc/testsuite/gcc.dg/unroll-6.c > > @@ -1,5 +1,5 @@ > > /* { dg-do compile } */ > > -/* { dg-options "-O3 -fdump-rtl-loop2_unroll-details-blocks > > -funroll-loops" } */ > > +/* { dg-options "-O3 -fdump-rtl-loop2_unroll-details-blocks -funroll-loops > > -fno- > tree-vectorize" } */ > > /* { dg-require-effective-target int32plus } */ > > > > void abort (void); > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-104.c > b/gcc/testsuite/gcc.dg/vect/vect-104.c > > index 730efd39bd4a..8890a5da180b 100644 > > --- a/gcc/testsuite/gcc.dg/vect/vect-104.c > > +++ b/gcc/testsuite/gcc.dg/vect/vect-104.c > > @@ -46,6 +46,7 @@ int main1 (int x) { > > #pragma GCC novector > > for (i = 0; i < N; i++) > > { > > +#pragma GCC novector > > for (j = 0; j < N; j++) > > { > > if (p->a[i][j] != c[i][j]) > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_108-pr113588.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_108-pr113588.c > > index e488619c9aac..78b22f3b43b4 100644 > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_108-pr113588.c > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_108-pr113588.c > > @@ -3,7 +3,7 @@ > > /* { dg-require-effective-target vect_early_break } */ > > /* { dg-require-effective-target vect_int } */ > > > > -/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > > int foo (const char *s, unsigned long n) > > { > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_109-pr113588.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_109-pr113588.c > > index 488c19d3ede8..2347fc26a14f 100644 > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_109-pr113588.c > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_109-pr113588.c > > @@ -3,7 +3,7 @@ > > /* { dg-require-effective-target vect_int } */ > > /* { dg-require-effective-target mmap } */ > > > > -/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > > #include <sys/mman.h> > > #include <unistd.h> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_110-pr113467.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_110-pr113467.c > > index 12d0ea1e871b..4f5a87c3ab94 100644 > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_110-pr113467.c > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_110-pr113467.c > > @@ -2,7 +2,7 @@ > > /* { dg-require-effective-target vect_early_break } */ > > /* { dg-require-effective-target vect_long_long } */ > > > > -/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > > #include "tree-vect.h" > > #include <stdint.h> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c > > index 4afbc7266765..9d6cd0a191f6 100644 > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c > > @@ -5,7 +5,7 @@ > > > > /* { dg-additional-options "-Ofast" } */ > > > > -/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > > unsigned test4(char x, char *vect, int n) > > { > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c > > index fa87999dcd4c..8763a5ff04ec 100644 > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c > > @@ -17,4 +17,4 @@ void f() { > > return; > > } > > > > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" } } */ > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c > > index 84e19423e2e6..541f439a9b49 100644 > > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c > > @@ -5,7 +5,7 @@ > > > > /* { dg-additional-options "-Ofast" } */ > > > > -/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > > > #include <complex.h> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c > b/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c > > index 15f3a4ef38a7..02ad7a451ca2 100644 > > --- a/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c > > +++ b/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c > > @@ -14,4 +14,5 @@ const unsigned char *search_line_fast2 (const unsigned > char *s, > > return s; > > } > > > > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail > > *-*-* } } > } */ > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { > > target { ilp32 > } } } } */ > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { > > target { ! > ilp32 } } } } */ > > diff --git a/gcc/testsuite/gfortran.dg/vect/vect-5.f90 > b/gcc/testsuite/gfortran.dg/vect/vect-5.f90 > > index b11cabaee23d..cca4875b859b 100644 > > --- a/gcc/testsuite/gfortran.dg/vect/vect-5.f90 > > +++ b/gcc/testsuite/gfortran.dg/vect/vect-5.f90 > > @@ -18,6 +18,7 @@ > > end do > > > > do I = 1, N > > +!GCC$ novector > > do J = I, M > > if (A(J,2) /= B(J)) then > > STOP 1 > > diff --git a/gcc/testsuite/gfortran.dg/vect/vect-8.f90 > b/gcc/testsuite/gfortran.dg/vect/vect-8.f90 > > index 918eddee292f..d4ce44feb4b9 100644 > > --- a/gcc/testsuite/gfortran.dg/vect/vect-8.f90 > > +++ b/gcc/testsuite/gfortran.dg/vect/vect-8.f90 > > @@ -706,7 +706,6 @@ CALL track('KERNEL ') > > RETURN > > END SUBROUTINE kernel > > > > -! { dg-final { scan-tree-dump-times "vectorized 2\[56\] loops" 1 "vect" { > > target > aarch64_sve } } } > > -! { dg-final { scan-tree-dump-times "vectorized 2\[45\] loops" 1 "vect" { > > target { > aarch64*-*-* && { ! aarch64_sve } } } } } > > -! { dg-final { scan-tree-dump-times "vectorized 2\[3456\] loops" 1 "vect" { > target { vect_intdouble_cvt && { ! aarch64*-*-* } } } } } > > +! { dg-final { scan-tree-dump-times "vectorized 2\[56\] loops" 1 "vect" { > > target > aarch64*-*-* } } } > > +! { dg-final { scan-tree-dump-times "vectorized 2\[34567\] loops" 1 "vect" > > { > target { vect_intdouble_cvt && { ! aarch64*-*-* } } } } } > > ! { dg-final { scan-tree-dump-times "vectorized 17 loops" 1 "vect" { > > target { { ! > vect_intdouble_cvt } && { ! aarch64*-*-* } } } } } > > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc > > index c10508de5554..6eda40267bd1 100644 > > --- a/gcc/tree-vect-data-refs.cc > > +++ b/gcc/tree-vect-data-refs.cc > > @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see > > #include "optabs-tree.h" > > #include "cgraph.h" > > #include "dumpfile.h" > > +#include "pretty-print.h" > > #include "alias.h" > > #include "fold-const.h" > > #include "stor-layout.h" > > @@ -750,15 +751,23 @@ vect_analyze_early_break_dependences > (loop_vec_info loop_vinfo) > > if (DR_IS_READ (dr_ref) > > && !ref_within_array_bound (stmt, DR_REF (dr_ref))) > > { > > + if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo) > > + || STMT_VINFO_STRIDED_P (stmt_vinfo)) > > + { > > + const char *msg > > + = "early break not supported: cannot peel " > > + "for alignment, vectorization would read out of " > > + "bounds at %G"; > > + return opt_result::failure_at (stmt, msg, stmt); > > + } > > + > > + dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo); > > + dr_info->need_peeling_for_alignment = true; > > + > > if (dump_enabled_p ()) > > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > - "early breaks not supported: vectorization " > > - "would %s beyond size of obj.\n", > > - DR_IS_READ (dr_ref) ? "read" : "write"); > > - return opt_result::failure_at (stmt, > > - "can't safely apply code motion to " > > - "dependencies of %G to vectorize " > > - "the early exit.\n", stmt); > > + dump_printf_loc (MSG_NOTE, vect_location, > > + "marking DR (read) as needing peeling for " > > + "alignment at %G", stmt); > > } > > > > if (DR_IS_READ (dr_ref)) > > @@ -1241,11 +1250,15 @@ dr_misalignment (dr_vec_info *dr_info, tree > vectype, poly_int64 offset) > > offset which can for example result from a negative stride access. */ > > poly_int64 misalignment = misalign + diff + offset; > > > > - /* vect_compute_data_ref_alignment will have ensured that > > target_alignment > > - is constant and otherwise set misalign to DR_MISALIGNMENT_UNKNOWN. > */ > > - unsigned HOST_WIDE_INT target_alignment_c > > - = dr_info->target_alignment.to_constant (); > > - if (!known_misalignment (misalignment, target_alignment_c, &misalign)) > > + /* Below we reject compile-time non-constant target alignments, but if > > + our misalignment is zero, then we are known to already be aligned > > + w.r.t. any such possible target alignment. */ > > + if (known_eq (misalignment, 0)) > > + return 0; > > + > > + unsigned HOST_WIDE_INT target_alignment_c; > > + if (!dr_info->target_alignment.is_constant (&target_alignment_c) > > + || !known_misalignment (misalignment, target_alignment_c, &misalign)) > > return DR_MISALIGNMENT_UNKNOWN; > > return misalign; > > } > > @@ -1313,6 +1326,9 @@ vect_record_base_alignments (vec_info *vinfo) > > Compute the misalignment of the data reference DR_INFO when vectorizing > > with VECTYPE. > > > > + RESULT is non-NULL iff VINFO is a loop_vec_info. In that case, *RESULT > > will > > + be set appropriately on failure (but is otherwise left unchanged). > > + > > Output: > > 1. initialized misalignment info for DR_INFO > > > > @@ -1321,7 +1337,7 @@ vect_record_base_alignments (vec_info *vinfo) > > > > static void > > vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info, > > - tree vectype) > > + tree vectype, opt_result *result = nullptr) > > { > > stmt_vec_info stmt_info = dr_info->stmt; > > vec_base_alignments *base_alignments = &vinfo->base_alignments; > > @@ -1348,6 +1364,67 @@ vect_compute_data_ref_alignment (vec_info *vinfo, > dr_vec_info *dr_info, > > poly_uint64 vector_alignment > > = exact_div (targetm.vectorize.preferred_vector_alignment (vectype), > > BITS_PER_UNIT); > > + > > + /* If this DR needs peeling for alignment for correctness, we must > > + ensure the target alignment is a constant power-of-two multiple of the > > + amount read per vector iteration (overriding the above hook where > > + necessary). */ > > + if (dr_info->need_peeling_for_alignment) > > + { > > + /* Vector size in bytes. */ > > + poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT > > (vectype)); > > + > > + /* We can only peel for loops, of course. */ > > + gcc_checking_assert (loop_vinfo); > > + > > + /* Calculate the number of vectors read per vector iteration. If > > + it is a power of two, multiply through to get the required > > + alignment in bytes. Otherwise, fail analysis since alignment > > + peeling wouldn't work in such a case. */ > > + poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) > > + num_scalars *= DR_GROUP_SIZE (stmt_info); > > + > > + auto num_vectors = vect_get_num_vectors (num_scalars, vectype); > > + if (!pow2p_hwi (num_vectors)) > > + { > > + *result = opt_result::failure_at (vect_location, > > + "non-power-of-two num vectors %u " > > + "for DR needing peeling for " > > + "alignment at %G", > > + num_vectors, stmt_info->stmt); > > + return; > > + } > > + > > + safe_align *= num_vectors; > > + if (maybe_gt (safe_align, 4096U)) > > + { > > + pretty_printer pp; > > + pp_wide_integer (&pp, safe_align); > > + *result = opt_result::failure_at (vect_location, > > + "alignment required for correctness" > > + " (%s) may exceed page size", > > + pp_formatted_text (&pp)); > > + return; > > + } > > + > > + unsigned HOST_WIDE_INT multiple; > > + if (!constant_multiple_p (vector_alignment, safe_align, &multiple) > > + || !pow2p_hwi (multiple)) > > + { > > + if (dump_enabled_p ()) > > + { > > + dump_printf_loc (MSG_NOTE, vect_location, > > + "forcing alignment for DR from preferred ("); > > + dump_dec (MSG_NOTE, vector_alignment); > > + dump_printf (MSG_NOTE, ") to safe align ("); > > + dump_dec (MSG_NOTE, safe_align); > > + dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt); > > + } > > + vector_alignment = safe_align; > > + } > > + } > > + > > SET_DR_TARGET_ALIGNMENT (dr_info, vector_alignment); > > > > /* If the main loop has peeled for alignment we have no way of knowing > > @@ -2865,8 +2942,12 @@ vect_analyze_data_refs_alignment (loop_vec_info > loop_vinfo) > > if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt) > > && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info->stmt) > > continue; > > + opt_result res = opt_result::success (); > > vect_compute_data_ref_alignment (loop_vinfo, dr_info, > > - STMT_VINFO_VECTYPE (dr_info->stmt)); > > + STMT_VINFO_VECTYPE (dr_info->stmt), > > + &res); > > + if (!res) > > + return res; > > } > > } > > > > @@ -7130,6 +7211,8 @@ vect_supportable_dr_alignment (vec_info *vinfo, > dr_vec_info *dr_info, > > > > if (misalignment == 0) > > return dr_aligned; > > + else if (dr_info->need_peeling_for_alignment) > > + return dr_unaligned_unsupported; > > > > /* For now assume all conditional loads/stores support unaligned > > access without any special code. */ > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc > > index 5d1b70aea43c..15cac0fe27df 100644 > > --- a/gcc/tree-vect-loop-manip.cc > > +++ b/gcc/tree-vect-loop-manip.cc > > @@ -3128,12 +3128,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree > niters, tree nitersm1, > > int estimated_vf; > > int prolog_peeling = 0; > > bool vect_epilogues = loop_vinfo->epilogue_vinfo != NULL; > > - /* We currently do not support prolog peeling if the target alignment is > > not > > - known at compile time. 'vect_gen_prolog_loop_niters' depends on the > > - target alignment being constant. */ > > - dr_vec_info *dr_info = LOOP_VINFO_UNALIGNED_DR (loop_vinfo); > > - if (dr_info && !DR_TARGET_ALIGNMENT (dr_info).is_constant ()) > > - return NULL; > > > > if (!vect_use_loop_mask_for_alignment_p (loop_vinfo)) > > prolog_peeling = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo); > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > > index 135eb119ca2e..79db02a39a8f 100644 > > --- a/gcc/tree-vectorizer.h > > +++ b/gcc/tree-vectorizer.h > > @@ -1278,6 +1278,11 @@ public: > > poly_uint64 target_alignment; > > /* If true the alignment of base_decl needs to be increased. */ > > bool base_misaligned; > > + > > + /* Set by early break vectorization when this DR needs peeling for > > alignment > > + for correctness. */ > > + bool need_peeling_for_alignment; > > + > > tree base_decl; > > > > /* Stores current vectorized loop's offset. To be added to the DR's