On Thu, Apr 21, 2016 at 6:09 PM, Ilya Enkovich <[email protected]> wrote:
> Hi,
>
> Currently when loop is vectorized we adjust its nb_iterations_upper_bound
> by dividing it by VF. This is incorrect since nb_iterations_upper_bound
> is upper bound for (<number of loop iterations> - 1) and therefore simple
> dividing it by VF in many cases gives us bounds greater than a real one.
> Correct value would be ((nb_iterations_upper_bound + 1) / VF - 1).
Yeah, that seems correct.
> Also decrement due to peeling for gaps should happen before we scale it
> by VF because peeling applies to a scalar loop, not vectorized one.
That's not true - PEELING_FOR_GAPs is so that the last _vector_ iteration
is peeled as scalar operations. We do not account for the amount
of known prologue peeling (if peeling for alignment and the misalignment
is known at compile-time) - that would be peeling of scalar iterations.
But it would be interesting to know why we need the != 0 check - static
cost modelling should have disabled vectorization if the vectorized body
isn't run.
> This patch modifies nb_iterations_upper_bound computation to resolve
> these issues.
You do not adjust the ->nb_iterations_estimate accordingly.
> Running regression testing I got one fail due to optimized loop. Heres
> is a loop:
>
> foo (signed char s)
> {
> signed char i;
> for (i = 0; i < s; i++)
> yy[i] = (signed int) i;
> }
>
> Here we vectorize for AVX512 using VF=64. Original loop has max 127
> iterations and therefore vectorized loop may be executed only once.
> With the patch applied compiler detects it and transforms loop into
> BB with just stores of constants vectors into yy. Test was adjusted
> to increase number of possible iterations. A copy of test was added
> to check we can optimize out the original loop.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu. OK for trunk?
I'd like to see testcases covering the corner-cases - have them have
upper bound estimates by adjusting known array sizes and also cover
the case of peeling for gaps.
Richard.
> Thanks,
> Ilya
> --
> gcc/
>
> 2016-04-21 Ilya Enkovich <[email protected]>
>
> * tree-vect-loop.c (vect_transform_loop): Fix
> nb_iterations_upper_bound computation for vectorized loop.
>
> gcc/testsuite/
>
> 2016-04-21 Ilya Enkovich <[email protected]>
>
> * gcc.target/i386/vect-unpack-2.c (avx512bw_test): Avoid
> optimization of vector loop.
> * gcc.target/i386/vect-unpack-3.c: New test.
>
>
> diff --git a/gcc/testsuite/gcc.target/i386/vect-unpack-2.c
> b/gcc/testsuite/gcc.target/i386/vect-unpack-2.c
> index 4825248..51c518e 100644
> --- a/gcc/testsuite/gcc.target/i386/vect-unpack-2.c
> +++ b/gcc/testsuite/gcc.target/i386/vect-unpack-2.c
> @@ -6,19 +6,22 @@
>
> #define N 120
> signed int yy[10000];
> +signed char zz[10000];
>
> void
> -__attribute__ ((noinline)) foo (signed char s)
> +__attribute__ ((noinline,noclone)) foo (int s)
> {
> - signed char i;
> + int i;
> for (i = 0; i < s; i++)
> - yy[i] = (signed int) i;
> + yy[i] = zz[i];
> }
>
> void
> avx512bw_test ()
> {
> signed char i;
> + for (i = 0; i < N; i++)
> + zz[i] = i;
> foo (N);
> for (i = 0; i < N; i++)
> if ( (signed int)i != yy [i] )
> diff --git a/gcc/testsuite/gcc.target/i386/vect-unpack-3.c
> b/gcc/testsuite/gcc.target/i386/vect-unpack-3.c
> new file mode 100644
> index 0000000..eb8a93e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-unpack-3.c
> @@ -0,0 +1,29 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-vect-details -ftree-vectorize -ffast-math
> -mavx512bw -save-temps" } */
> +/* { dg-require-effective-target avx512bw } */
> +
> +#include "avx512bw-check.h"
> +
> +#define N 120
> +signed int yy[10000];
> +
> +void
> +__attribute__ ((noinline)) foo (signed char s)
> +{
> + signed char i;
> + for (i = 0; i < s; i++)
> + yy[i] = (signed int) i;
> +}
> +
> +void
> +avx512bw_test ()
> +{
> + signed char i;
> + foo (N);
> + for (i = 0; i < N; i++)
> + if ( (signed int)i != yy [i] )
> + abort ();
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-assembler-not "vpmovsxbw\[ \\t\]+\[^\n\]*%zmm" } } */
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index d813b86..da98211 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -6921,11 +6921,13 @@ vect_transform_loop (loop_vec_info loop_vinfo)
> /* Reduce loop iterations by the vectorization factor. */
> scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vectorization_factor),
> expected_iterations / vectorization_factor);
> - loop->nb_iterations_upper_bound
> - = wi::udiv_floor (loop->nb_iterations_upper_bound, vectorization_factor);
> if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> && loop->nb_iterations_upper_bound != 0)
> loop->nb_iterations_upper_bound = loop->nb_iterations_upper_bound - 1;
> + loop->nb_iterations_upper_bound
> + = wi::udiv_floor (loop->nb_iterations_upper_bound + 1,
> + vectorization_factor) - 1;
> +
> if (loop->any_estimate)
> {
> loop->nb_iterations_estimate