"Andre Vieira (lists)" <andre.simoesdiasvie...@arm.com> writes: > Hi, > > When vectorizing with --param vect-partial-vector-usage=1 the vectorizer > uses an unpredicated (all-true predicate for SVE) main loop and a > predicated tail loop. The way this was implemented seems to mean it > re-uses the same vector-mode for both loops, which means the tail loop > isn't an actual loop but only executes one iteration. > > This patch uses the knowledge of the conditions to enter an epilogue > loop to help come up with a potentially more restricive upper bound. > > Regression tested on aarch64-linux-gnu and also ran the testsuite using > '--param vect-partial-vector-usage=1' detecting no ICEs and no execution > failures. > > Would be good to have this tested for PPC too as I believe they are the > main users of the --param vect-partial-vector-usage=1 option. Can > someone help me test (and maybe even benchmark?) this on a PPC target? > > Kind regards, > Andre
LGTM. OK if no objections and if the Power testing comes back clean. Thanks, Richard > gcc/ChangeLog: > > * tree-vect-loop.c (vect_transform_loop): Use main loop's > various' thresholds > to narrow the upper bound on epilogue iterations. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/sve/part_vect_single_iter_epilog.c: New test. > > diff --git > a/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c > b/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..a03229eb55585f637ebd5288fb4c00f8f921d44c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 --param vect-partial-vector-usage=1" } */ > + > +void > +foo (short * __restrict__ a, short * __restrict__ b, short * __restrict__ c, > int n) > +{ > + for (int i = 0; i < n; ++i) > + c[i] = a[i] + b[i]; > +} > + > +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.h, wzr, [xw][0-9]+} > 1 } } */ > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index > 3e973e774af8f9205be893e01ad9263281116885..81e9c5cc42415a0a92b765bc46640105670c4e6b > 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -9723,12 +9723,31 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple > *loop_vectorized_call) > /* In these calculations the "- 1" converts loop iteration counts > back to latch counts. */ > if (loop->any_upper_bound) > - loop->nb_iterations_upper_bound > - = (final_iter_may_be_partial > - ? wi::udiv_ceil (loop->nb_iterations_upper_bound + bias_for_lowest, > - lowest_vf) - 1 > - : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest, > - lowest_vf) - 1); > + { > + loop_vec_info main_vinfo = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo); > + loop->nb_iterations_upper_bound > + = (final_iter_may_be_partial > + ? wi::udiv_ceil (loop->nb_iterations_upper_bound + bias_for_lowest, > + lowest_vf) - 1 > + : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest, > + lowest_vf) - 1); > + if (main_vinfo) > + { > + unsigned int bound; > + poly_uint64 main_iters > + = upper_bound (LOOP_VINFO_VECT_FACTOR (main_vinfo), > + LOOP_VINFO_COST_MODEL_THRESHOLD (main_vinfo)); > + main_iters > + = upper_bound (main_iters, > + LOOP_VINFO_VERSIONING_THRESHOLD (main_vinfo)); > + if (can_div_away_from_zero_p (main_iters, > + LOOP_VINFO_VECT_FACTOR (loop_vinfo), > + &bound)) > + loop->nb_iterations_upper_bound > + = wi::umin ((widest_int) (bound - 1), > + loop->nb_iterations_upper_bound); > + } > + } > if (loop->any_likely_upper_bound) > loop->nb_iterations_likely_upper_bound > = (final_iter_may_be_partial