On Thu, May 11, 2017 at 10:59:41AM -0500, Bill Schmidt wrote:
> PR80695 identifies a case (similar to several others we've seen) where SLP
> vectorization is too aggressive about vectorizing stores.  The problem is
> that we undervalue the cost of a vec_construct operation.  vec_construct
> is the vectorizer's representation for building a vector from scalar
> elements.  When we construct an integer vector type from its constituent
> parts, it requires a direct move from two GPRs (one instruction on P9,
> two direct moves and a merge on P8).  The high cost of this is not
> reflected in the current cost calculation, which only counts the cost
> of combining the elements using N-1 inserts.  This patch provides a higher
> estimate that is closer to reality.  Note that all cost estimation for
> vectorization is a bit rough, so this should be viewed as a heuristic.
> 
> The patch treats all integer vectors separately from the default case.
> There is already special handling for V4SFmode, so this leaves only
> V2DFmode in the default case.  It was previously established heuristically
> that a cost factor of 2 was appropriate for V2DFmode, so that is left
> unchanged here; but since V2DFmode is the only default, we can simplify
> the calculation to just return 2.
> 
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
> Is this ok for trunk?

Seems fine to me (well, minor stuff below).  Thanks,


Segher


> 2017-05-11  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
> 
>       PR target/80695
>       * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost):
>       Account for direct move costs for vec_construct of integer
>       vectors.
> 
> [gcc/testsuite]
> 
> 2017-05-11  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
> 
>       PR target/80695
>       * gcc.target/powerpc/pr80695-p8.c: New file.
>       * gcc.target/powerpc/pr80695-p9.c: New file.


> @@ -5849,8 +5849,20 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
>       if (SCALAR_FLOAT_TYPE_P (elem_type)
>           && TYPE_PRECISION (elem_type) == 32)
>         return 5;
> +     /* On POWER9, integer vector types are built up in GPRs and then
> +           use a direct move (2 cycles).  For POWER8 this is even worse,
> +           as we need two direct moves and a merge, and the direct moves
> +        are five cycles.  */

You're mixing tabs and spaces here.

Reply via email to