Hi Bill,

On Wed, May 03, 2017 at 02:43:09PM -0500, Bill Schmidt wrote:
> We recently became aware of some poor code generation as a result of
> unprofitable (for POWER) loop vectorization.  When a loop is simply copying
> data with 64-bit loads and stores, vectorizing with 128-bit loads and stores
> generally does not provide any benefit on modern POWER processors.
> Furthermore, if there is a requirement to version the loop for aliasing,
> alignment, etc., the cost of the versioning test is almost certainly a
> performance loss for such loops.  The user code example included such a copy
> loop, executed only a few times on average, within an outer loop that was
> executed many times on average, causing a tremendous slowdown.
> 
> This patch very specifically targets these kinds of loops and no others,
> and artificially inflates the vectorization cost to ensure vectorization
> does not appear profitable.  This is done within the target model cost
> hooks to avoid affecting other targets.  A new test case is included that
> demonstrates the refusal to vectorize.
> 
> We've done SPEC performance testing to verify that the patch does not
> degrade such workloads.  Results were all in the noise range.  The
> customer code performance loss was verified to have been reversed.
> 
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
> Is this ok for trunk?

> 2017-05-03  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
> 
>       * config/rs6000/rs6000.c (rs6000_vect_nonmem): New static var.
>       (rs6000_init_cost): Initialize rs6000_vect_nonmem.
>       (rs6000_add_stmt_cost): Update rs6000_vect_nonmem.
>       (rs6000_finish_cost): Avoid vectorizing simple copy loops with
>       VF=2 that require versioning.
> 
> [gcc/testsuite]
> 
> 2017-05-03  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
> 
>       * gcc.target/powerpc/veresioned-copy-loop.c: New file.

> --- gcc/config/rs6000/rs6000.c        (revision 247560)
> +++ gcc/config/rs6000/rs6000.c        (working copy)
> @@ -5873,6 +5873,8 @@ rs6000_density_test (rs6000_cost_data *data)
>  
>  /* Implement targetm.vectorize.init_cost.  */
>  
> +static bool rs6000_vect_nonmem;

Please put a comment on this, saying what it is for.

> +      /* Check whether we're doing something other than just a copy loop.
> +      Not all such loops may be profitably vectorized; see
> +      rs6000_finish_cost.  */
> +      if ((where == vect_body
> +        && (kind == vector_stmt || kind == vec_to_scalar || kind == vec_perm
> +            || kind == vec_promote_demote || kind == vec_construct
> +            || kind == scalar_to_vec))
> +       || (where != vect_body
> +           && (kind == vec_to_scalar || kind == vec_perm
> +               || kind == vec_promote_demote || kind == vec_construct
> +               || kind == scalar_to_vec)))
> +     rs6000_vect_nonmem = true;

Perhaps

+      if ((kind == vec_to_scalar || kind == vec_perm
+              || kind == vec_promote_demote || kind == vec_construct
+              || kind == scalar_to_vec)
+          || (where == vect_body && kind == vector_stmt))
> +     rs6000_vect_nonmem = true;

if you agree that is clearer.

Okay for trunk with the comment added, and the condition either or not
simplified.  Thanks,


Segher

Reply via email to