On Thu, Apr 21, 2016 at 6:09 PM, Ilya Enkovich <enkovich....@gmail.com> wrote: > Hi, > > Currently when loop is vectorized we adjust its nb_iterations_upper_bound > by dividing it by VF. This is incorrect since nb_iterations_upper_bound > is upper bound for (<number of loop iterations> - 1) and therefore simple > dividing it by VF in many cases gives us bounds greater than a real one. > Correct value would be ((nb_iterations_upper_bound + 1) / VF - 1).
Yeah, that seems correct. > Also decrement due to peeling for gaps should happen before we scale it > by VF because peeling applies to a scalar loop, not vectorized one. That's not true - PEELING_FOR_GAPs is so that the last _vector_ iteration is peeled as scalar operations. We do not account for the amount of known prologue peeling (if peeling for alignment and the misalignment is known at compile-time) - that would be peeling of scalar iterations. But it would be interesting to know why we need the != 0 check - static cost modelling should have disabled vectorization if the vectorized body isn't run. > This patch modifies nb_iterations_upper_bound computation to resolve > these issues. You do not adjust the ->nb_iterations_estimate accordingly. > Running regression testing I got one fail due to optimized loop. Heres > is a loop: > > foo (signed char s) > { > signed char i; > for (i = 0; i < s; i++) > yy[i] = (signed int) i; > } > > Here we vectorize for AVX512 using VF=64. Original loop has max 127 > iterations and therefore vectorized loop may be executed only once. > With the patch applied compiler detects it and transforms loop into > BB with just stores of constants vectors into yy. Test was adjusted > to increase number of possible iterations. A copy of test was added > to check we can optimize out the original loop. > > Bootstrapped and regtested on x86_64-pc-linux-gnu. OK for trunk? I'd like to see testcases covering the corner-cases - have them have upper bound estimates by adjusting known array sizes and also cover the case of peeling for gaps. Richard. > Thanks, > Ilya > -- > gcc/ > > 2016-04-21 Ilya Enkovich <ilya.enkov...@intel.com> > > * tree-vect-loop.c (vect_transform_loop): Fix > nb_iterations_upper_bound computation for vectorized loop. > > gcc/testsuite/ > > 2016-04-21 Ilya Enkovich <ilya.enkov...@intel.com> > > * gcc.target/i386/vect-unpack-2.c (avx512bw_test): Avoid > optimization of vector loop. > * gcc.target/i386/vect-unpack-3.c: New test. > > > diff --git a/gcc/testsuite/gcc.target/i386/vect-unpack-2.c > b/gcc/testsuite/gcc.target/i386/vect-unpack-2.c > index 4825248..51c518e 100644 > --- a/gcc/testsuite/gcc.target/i386/vect-unpack-2.c > +++ b/gcc/testsuite/gcc.target/i386/vect-unpack-2.c > @@ -6,19 +6,22 @@ > > #define N 120 > signed int yy[10000]; > +signed char zz[10000]; > > void > -__attribute__ ((noinline)) foo (signed char s) > +__attribute__ ((noinline,noclone)) foo (int s) > { > - signed char i; > + int i; > for (i = 0; i < s; i++) > - yy[i] = (signed int) i; > + yy[i] = zz[i]; > } > > void > avx512bw_test () > { > signed char i; > + for (i = 0; i < N; i++) > + zz[i] = i; > foo (N); > for (i = 0; i < N; i++) > if ( (signed int)i != yy [i] ) > diff --git a/gcc/testsuite/gcc.target/i386/vect-unpack-3.c > b/gcc/testsuite/gcc.target/i386/vect-unpack-3.c > new file mode 100644 > index 0000000..eb8a93e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vect-unpack-3.c > @@ -0,0 +1,29 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -fdump-tree-vect-details -ftree-vectorize -ffast-math > -mavx512bw -save-temps" } */ > +/* { dg-require-effective-target avx512bw } */ > + > +#include "avx512bw-check.h" > + > +#define N 120 > +signed int yy[10000]; > + > +void > +__attribute__ ((noinline)) foo (signed char s) > +{ > + signed char i; > + for (i = 0; i < s; i++) > + yy[i] = (signed int) i; > +} > + > +void > +avx512bw_test () > +{ > + signed char i; > + foo (N); > + for (i = 0; i < N; i++) > + if ( (signed int)i != yy [i] ) > + abort (); > +} > + > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > +/* { dg-final { scan-assembler-not "vpmovsxbw\[ \\t\]+\[^\n\]*%zmm" } } */ > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index d813b86..da98211 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -6921,11 +6921,13 @@ vect_transform_loop (loop_vec_info loop_vinfo) > /* Reduce loop iterations by the vectorization factor. */ > scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vectorization_factor), > expected_iterations / vectorization_factor); > - loop->nb_iterations_upper_bound > - = wi::udiv_floor (loop->nb_iterations_upper_bound, vectorization_factor); > if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) > && loop->nb_iterations_upper_bound != 0) > loop->nb_iterations_upper_bound = loop->nb_iterations_upper_bound - 1; > + loop->nb_iterations_upper_bound > + = wi::udiv_floor (loop->nb_iterations_upper_bound + 1, > + vectorization_factor) - 1; > + > if (loop->any_estimate) > { > loop->nb_iterations_estimate