Hi! On 2025-01-10T21:22:03+0000, Tamar Christina via Gcc-cvs <gcc-...@gcc.gnu.org> wrote: > https://gcc.gnu.org/g:68326d5d1a593dc0bf098c03aac25916168bc5a9 > > commit r15-6807-g68326d5d1a593dc0bf098c03aac25916168bc5a9 > Author: Alex Coplan <alex.cop...@arm.com> > Date: Mon Mar 11 13:09:10 2024 +0000 > > vect: Force alignment peeling to vectorize more early break loops > [PR118211]
In addition to the regression already noted elsewhere: PASS: gcc.dg/tree-ssa/predcom-8.c (test for excess errors) PASS: gcc.dg/tree-ssa/predcom-8.c scan-tree-dump pcom "Executing predictive commoning without unrolling" [-PASS:-]{+FAIL:+} gcc.dg/tree-ssa/predcom-8.c scan-tree-dump-not pcom "Invalid sum" ..., this commit for for '--target=amdgcn-amdhsa' (tested '-march=gfx908', '-march=gfx1100') also regresses: PASS: gcc.dg/vect/vect-switch-search-line-fast.c (test for excess errors) [-XFAIL:-]{+FAIL:+} gcc.dg/vect/vect-switch-search-line-fast.c scan-tree-dump-times vect "vectorized 1 loops" [-1-]{+0+} gcc.dg/vect/vect-switch-search-line-fast.c: pattern found 1 times > --- a/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c > +++ b/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c > [...] > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail > *-*-* } } } */ > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target > { ilp32 } } } } */ > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target > { ! ilp32 } } } } */ Presuming that it's correct that GCN continues to be able vectorize this, what is the appropriate conditional to use? Grüße Thomas > This allows us to vectorize more loops with early exits by forcing > peeling for alignment to make sure that we're guaranteed to be able to > safely read an entire vector iteration without crossing a page boundary. > > To make this work for VLA architectures we have to allow compile-time > non-constant target alignments. We also have to override the result of > the target's preferred_vector_alignment hook if it isn't a power-of-two > multiple of the TYPE_SIZE of the chosen vector type. > > gcc/ChangeLog: > > PR tree-optimization/118211 > PR tree-optimization/116126 > * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): > Set need_peeling_for_alignment flag on read DRs instead of > failing vectorization. Punt on gathers. > (dr_misalignment): Handle non-constant target alignments. > (vect_compute_data_ref_alignment): If need_peeling_for_alignment > flag is set on the DR, then override the target alignment chosen > by the preferred_vector_alignment hook to choose a safe > alignment. > (vect_supportable_dr_alignment): Override > support_vector_misalignment hook if need_peeling_for_alignment > is set on the DR: in this case we must return > dr_unaligned_unsupported in order to force peeling. > * tree-vect-loop-manip.cc (vect_do_peeling): Allow prolog > peeling by a compile-time non-constant amount. > * tree-vectorizer.h (dr_vec_info): Add new flag > need_peeling_for_alignment. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/118211 > PR tree-optimization/116126 > * gcc.dg/tree-ssa/cunroll-13.c: Don't vectorize. > * gcc.dg/tree-ssa/cunroll-14.c: Likewise. > * gcc.dg/unroll-6.c: Likewise. > * gcc.dg/tree-ssa/gen-vect-28.c: Likewise. > * gcc.dg/vect/vect-104.c: Expect to vectorize. > * gcc.dg/vect/vect-early-break_108-pr113588.c: Likewise. > * gcc.dg/vect/vect-early-break_109-pr113588.c: Likewise. > * gcc.dg/vect/vect-early-break_110-pr113467.c: Likewise. > * gcc.dg/vect/vect-early-break_3.c: Likewise. > * gcc.dg/vect/vect-early-break_65.c: Likewise. > * gcc.dg/vect/vect-early-break_8.c: Likewise. > * gfortran.dg/vect/vect-5.f90: Likewise. > * gfortran.dg/vect/vect-8.f90: Likewise. > * gcc.dg/vect/vect-switch-search-line-fast.c: > > Co-Authored-By: Tamar Christina <tamar.christ...@arm.com> > > Diff: > --- > gcc/testsuite/gcc.dg/tree-ssa/cunroll-13.c | 2 +- > gcc/testsuite/gcc.dg/tree-ssa/cunroll-14.c | 2 +- > gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c | 1 + > gcc/testsuite/gcc.dg/unroll-6.c | 2 +- > gcc/testsuite/gcc.dg/vect/vect-104.c | 1 + > .../gcc.dg/vect/vect-early-break_108-pr113588.c | 2 +- > .../gcc.dg/vect/vect-early-break_109-pr113588.c | 2 +- > .../gcc.dg/vect/vect-early-break_110-pr113467.c | 2 +- > gcc/testsuite/gcc.dg/vect/vect-early-break_3.c | 2 +- > gcc/testsuite/gcc.dg/vect/vect-early-break_65.c | 2 +- > gcc/testsuite/gcc.dg/vect/vect-early-break_8.c | 2 +- > .../gcc.dg/vect/vect-switch-search-line-fast.c | 3 +- > gcc/testsuite/gfortran.dg/vect/vect-5.f90 | 1 + > gcc/testsuite/gfortran.dg/vect/vect-8.f90 | 5 +- > gcc/tree-vect-data-refs.cc | 113 > ++++++++++++++++++--- > gcc/tree-vect-loop-manip.cc | 6 -- > gcc/tree-vectorizer.h | 5 + > 17 files changed, 119 insertions(+), 34 deletions(-) > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-13.c > b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-13.c > index 98cb56a8564b..154e2963f12d 100644 > --- a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-13.c > +++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-13.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O3 -fgimple -fdump-tree-cunroll-blocks-details" } */ > +/* { dg-options "-O3 -fgimple -fdump-tree-cunroll-blocks-details > -fno-tree-vectorize" } */ > > #if __SIZEOF_INT__ < 4 > __extension__ typedef __INT32_TYPE__ i32; > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-14.c > b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-14.c > index 5f112da310c8..4b369f7ad278 100644 > --- a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-14.c > +++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-14.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O3 -fdump-tree-cunroll-blocks-details" } */ > +/* { dg-options "-O3 -fdump-tree-cunroll-blocks-details -fno-tree-vectorize" > } */ > struct a {int a[100];}; > void > t(struct a *a) > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c > b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c > index c5f1b5aff115..5c0ea58a7b00 100644 > --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c > +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c > @@ -20,6 +20,7 @@ int main_1 (int off) > } > > /* check results: */ > +#pragma GCC novector > for (i = 0; i < N; i++) > { > if (ia[i+off] != 5) > diff --git a/gcc/testsuite/gcc.dg/unroll-6.c b/gcc/testsuite/gcc.dg/unroll-6.c > index 7664bbff109f..7be1b7cfadba 100644 > --- a/gcc/testsuite/gcc.dg/unroll-6.c > +++ b/gcc/testsuite/gcc.dg/unroll-6.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O3 -fdump-rtl-loop2_unroll-details-blocks -funroll-loops" > } */ > +/* { dg-options "-O3 -fdump-rtl-loop2_unroll-details-blocks -funroll-loops > -fno-tree-vectorize" } */ > /* { dg-require-effective-target int32plus } */ > > void abort (void); > diff --git a/gcc/testsuite/gcc.dg/vect/vect-104.c > b/gcc/testsuite/gcc.dg/vect/vect-104.c > index 730efd39bd4a..8890a5da180b 100644 > --- a/gcc/testsuite/gcc.dg/vect/vect-104.c > +++ b/gcc/testsuite/gcc.dg/vect/vect-104.c > @@ -46,6 +46,7 @@ int main1 (int x) { > #pragma GCC novector > for (i = 0; i < N; i++) > { > +#pragma GCC novector > for (j = 0; j < N; j++) > { > if (p->a[i][j] != c[i][j]) > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_108-pr113588.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_108-pr113588.c > index e488619c9aac..78b22f3b43b4 100644 > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_108-pr113588.c > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_108-pr113588.c > @@ -3,7 +3,7 @@ > /* { dg-require-effective-target vect_early_break } */ > /* { dg-require-effective-target vect_int } */ > > -/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > int foo (const char *s, unsigned long n) > { > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_109-pr113588.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_109-pr113588.c > index 488c19d3ede8..2347fc26a14f 100644 > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_109-pr113588.c > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_109-pr113588.c > @@ -3,7 +3,7 @@ > /* { dg-require-effective-target vect_int } */ > /* { dg-require-effective-target mmap } */ > > -/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > #include <sys/mman.h> > #include <unistd.h> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_110-pr113467.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_110-pr113467.c > index 12d0ea1e871b..4f5a87c3ab94 100644 > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_110-pr113467.c > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_110-pr113467.c > @@ -2,7 +2,7 @@ > /* { dg-require-effective-target vect_early_break } */ > /* { dg-require-effective-target vect_long_long } */ > > -/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > #include "tree-vect.h" > #include <stdint.h> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c > index 4afbc7266765..9d6cd0a191f6 100644 > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c > @@ -5,7 +5,7 @@ > > /* { dg-additional-options "-Ofast" } */ > > -/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > unsigned test4(char x, char *vect, int n) > { > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c > index fa87999dcd4c..8763a5ff04ec 100644 > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c > @@ -17,4 +17,4 @@ void f() { > return; > } > > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" } } */ > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c > index 84e19423e2e6..541f439a9b49 100644 > --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c > @@ -5,7 +5,7 @@ > > /* { dg-additional-options "-Ofast" } */ > > -/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > > #include <complex.h> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c > b/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c > index 15f3a4ef38a7..02ad7a451ca2 100644 > --- a/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c > +++ b/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c > @@ -14,4 +14,5 @@ const unsigned char *search_line_fast2 (const unsigned char > *s, > return s; > } > > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail > *-*-* } } } */ > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target > { ilp32 } } } } */ > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target > { ! ilp32 } } } } */ > diff --git a/gcc/testsuite/gfortran.dg/vect/vect-5.f90 > b/gcc/testsuite/gfortran.dg/vect/vect-5.f90 > index b11cabaee23d..cca4875b859b 100644 > --- a/gcc/testsuite/gfortran.dg/vect/vect-5.f90 > +++ b/gcc/testsuite/gfortran.dg/vect/vect-5.f90 > @@ -18,6 +18,7 @@ > end do > > do I = 1, N > +!GCC$ novector > do J = I, M > if (A(J,2) /= B(J)) then > STOP 1 > diff --git a/gcc/testsuite/gfortran.dg/vect/vect-8.f90 > b/gcc/testsuite/gfortran.dg/vect/vect-8.f90 > index 918eddee292f..d4ce44feb4b9 100644 > --- a/gcc/testsuite/gfortran.dg/vect/vect-8.f90 > +++ b/gcc/testsuite/gfortran.dg/vect/vect-8.f90 > @@ -706,7 +706,6 @@ CALL track('KERNEL ') > RETURN > END SUBROUTINE kernel > > -! { dg-final { scan-tree-dump-times "vectorized 2\[56\] loops" 1 "vect" { > target aarch64_sve } } } > -! { dg-final { scan-tree-dump-times "vectorized 2\[45\] loops" 1 "vect" { > target { aarch64*-*-* && { ! aarch64_sve } } } } } > -! { dg-final { scan-tree-dump-times "vectorized 2\[3456\] loops" 1 "vect" { > target { vect_intdouble_cvt && { ! aarch64*-*-* } } } } } > +! { dg-final { scan-tree-dump-times "vectorized 2\[56\] loops" 1 "vect" { > target aarch64*-*-* } } } > +! { dg-final { scan-tree-dump-times "vectorized 2\[34567\] loops" 1 "vect" { > target { vect_intdouble_cvt && { ! aarch64*-*-* } } } } } > ! { dg-final { scan-tree-dump-times "vectorized 17 loops" 1 "vect" { target > { { ! vect_intdouble_cvt } && { ! aarch64*-*-* } } } } } > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc > index c10508de5554..6eda40267bd1 100644 > --- a/gcc/tree-vect-data-refs.cc > +++ b/gcc/tree-vect-data-refs.cc > @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see > #include "optabs-tree.h" > #include "cgraph.h" > #include "dumpfile.h" > +#include "pretty-print.h" > #include "alias.h" > #include "fold-const.h" > #include "stor-layout.h" > @@ -750,15 +751,23 @@ vect_analyze_early_break_dependences (loop_vec_info > loop_vinfo) > if (DR_IS_READ (dr_ref) > && !ref_within_array_bound (stmt, DR_REF (dr_ref))) > { > + if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo) > + || STMT_VINFO_STRIDED_P (stmt_vinfo)) > + { > + const char *msg > + = "early break not supported: cannot peel " > + "for alignment, vectorization would read out of " > + "bounds at %G"; > + return opt_result::failure_at (stmt, msg, stmt); > + } > + > + dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo); > + dr_info->need_peeling_for_alignment = true; > + > if (dump_enabled_p ()) > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "early breaks not supported: vectorization " > - "would %s beyond size of obj.\n", > - DR_IS_READ (dr_ref) ? "read" : "write"); > - return opt_result::failure_at (stmt, > - "can't safely apply code motion to " > - "dependencies of %G to vectorize " > - "the early exit.\n", stmt); > + dump_printf_loc (MSG_NOTE, vect_location, > + "marking DR (read) as needing peeling for " > + "alignment at %G", stmt); > } > > if (DR_IS_READ (dr_ref)) > @@ -1241,11 +1250,15 @@ dr_misalignment (dr_vec_info *dr_info, tree vectype, > poly_int64 offset) > offset which can for example result from a negative stride access. */ > poly_int64 misalignment = misalign + diff + offset; > > - /* vect_compute_data_ref_alignment will have ensured that target_alignment > - is constant and otherwise set misalign to DR_MISALIGNMENT_UNKNOWN. */ > - unsigned HOST_WIDE_INT target_alignment_c > - = dr_info->target_alignment.to_constant (); > - if (!known_misalignment (misalignment, target_alignment_c, &misalign)) > + /* Below we reject compile-time non-constant target alignments, but if > + our misalignment is zero, then we are known to already be aligned > + w.r.t. any such possible target alignment. */ > + if (known_eq (misalignment, 0)) > + return 0; > + > + unsigned HOST_WIDE_INT target_alignment_c; > + if (!dr_info->target_alignment.is_constant (&target_alignment_c) > + || !known_misalignment (misalignment, target_alignment_c, &misalign)) > return DR_MISALIGNMENT_UNKNOWN; > return misalign; > } > @@ -1313,6 +1326,9 @@ vect_record_base_alignments (vec_info *vinfo) > Compute the misalignment of the data reference DR_INFO when vectorizing > with VECTYPE. > > + RESULT is non-NULL iff VINFO is a loop_vec_info. In that case, *RESULT > will > + be set appropriately on failure (but is otherwise left unchanged). > + > Output: > 1. initialized misalignment info for DR_INFO > > @@ -1321,7 +1337,7 @@ vect_record_base_alignments (vec_info *vinfo) > > static void > vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info, > - tree vectype) > + tree vectype, opt_result *result = nullptr) > { > stmt_vec_info stmt_info = dr_info->stmt; > vec_base_alignments *base_alignments = &vinfo->base_alignments; > @@ -1348,6 +1364,67 @@ vect_compute_data_ref_alignment (vec_info *vinfo, > dr_vec_info *dr_info, > poly_uint64 vector_alignment > = exact_div (targetm.vectorize.preferred_vector_alignment (vectype), > BITS_PER_UNIT); > + > + /* If this DR needs peeling for alignment for correctness, we must > + ensure the target alignment is a constant power-of-two multiple of the > + amount read per vector iteration (overriding the above hook where > + necessary). */ > + if (dr_info->need_peeling_for_alignment) > + { > + /* Vector size in bytes. */ > + poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT > (vectype)); > + > + /* We can only peel for loops, of course. */ > + gcc_checking_assert (loop_vinfo); > + > + /* Calculate the number of vectors read per vector iteration. If > + it is a power of two, multiply through to get the required > + alignment in bytes. Otherwise, fail analysis since alignment > + peeling wouldn't work in such a case. */ > + poly_uint64 num_scalars = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) > + num_scalars *= DR_GROUP_SIZE (stmt_info); > + > + auto num_vectors = vect_get_num_vectors (num_scalars, vectype); > + if (!pow2p_hwi (num_vectors)) > + { > + *result = opt_result::failure_at (vect_location, > + "non-power-of-two num vectors %u " > + "for DR needing peeling for " > + "alignment at %G", > + num_vectors, stmt_info->stmt); > + return; > + } > + > + safe_align *= num_vectors; > + if (maybe_gt (safe_align, 4096U)) > + { > + pretty_printer pp; > + pp_wide_integer (&pp, safe_align); > + *result = opt_result::failure_at (vect_location, > + "alignment required for correctness" > + " (%s) may exceed page size", > + pp_formatted_text (&pp)); > + return; > + } > + > + unsigned HOST_WIDE_INT multiple; > + if (!constant_multiple_p (vector_alignment, safe_align, &multiple) > + || !pow2p_hwi (multiple)) > + { > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_NOTE, vect_location, > + "forcing alignment for DR from preferred ("); > + dump_dec (MSG_NOTE, vector_alignment); > + dump_printf (MSG_NOTE, ") to safe align ("); > + dump_dec (MSG_NOTE, safe_align); > + dump_printf (MSG_NOTE, ") for stmt: %G", stmt_info->stmt); > + } > + vector_alignment = safe_align; > + } > + } > + > SET_DR_TARGET_ALIGNMENT (dr_info, vector_alignment); > > /* If the main loop has peeled for alignment we have no way of knowing > @@ -2865,8 +2942,12 @@ vect_analyze_data_refs_alignment (loop_vec_info > loop_vinfo) > if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt) > && DR_GROUP_FIRST_ELEMENT (dr_info->stmt) != dr_info->stmt) > continue; > + opt_result res = opt_result::success (); > vect_compute_data_ref_alignment (loop_vinfo, dr_info, > - STMT_VINFO_VECTYPE (dr_info->stmt)); > + STMT_VINFO_VECTYPE (dr_info->stmt), > + &res); > + if (!res) > + return res; > } > } > > @@ -7130,6 +7211,8 @@ vect_supportable_dr_alignment (vec_info *vinfo, > dr_vec_info *dr_info, > > if (misalignment == 0) > return dr_aligned; > + else if (dr_info->need_peeling_for_alignment) > + return dr_unaligned_unsupported; > > /* For now assume all conditional loads/stores support unaligned > access without any special code. */ > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc > index 5d1b70aea43c..15cac0fe27df 100644 > --- a/gcc/tree-vect-loop-manip.cc > +++ b/gcc/tree-vect-loop-manip.cc > @@ -3128,12 +3128,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree > niters, tree nitersm1, > int estimated_vf; > int prolog_peeling = 0; > bool vect_epilogues = loop_vinfo->epilogue_vinfo != NULL; > - /* We currently do not support prolog peeling if the target alignment is > not > - known at compile time. 'vect_gen_prolog_loop_niters' depends on the > - target alignment being constant. */ > - dr_vec_info *dr_info = LOOP_VINFO_UNALIGNED_DR (loop_vinfo); > - if (dr_info && !DR_TARGET_ALIGNMENT (dr_info).is_constant ()) > - return NULL; > > if (!vect_use_loop_mask_for_alignment_p (loop_vinfo)) > prolog_peeling = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo); > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > index 135eb119ca2e..79db02a39a8f 100644 > --- a/gcc/tree-vectorizer.h > +++ b/gcc/tree-vectorizer.h > @@ -1278,6 +1278,11 @@ public: > poly_uint64 target_alignment; > /* If true the alignment of base_decl needs to be increased. */ > bool base_misaligned; > + > + /* Set by early break vectorization when this DR needs peeling for > alignment > + for correctness. */ > + bool need_peeling_for_alignment; > + > tree base_decl; > > /* Stores current vectorized loop's offset. To be added to the DR's