Hi,

PR80695 identifies a case (similar to several others we've seen) where SLP
vectorization is too aggressive about vectorizing stores.  The problem is
that we undervalue the cost of a vec_construct operation.  vec_construct
is the vectorizer's representation for building a vector from scalar
elements.  When we construct an integer vector type from its constituent
parts, it requires a direct move from two GPRs (one instruction on P9,
two direct moves and a merge on P8).  The high cost of this is not
reflected in the current cost calculation, which only counts the cost
of combining the elements using N-1 inserts.  This patch provides a higher
estimate that is closer to reality.  Note that all cost estimation for
vectorization is a bit rough, so this should be viewed as a heuristic.

The patch treats all integer vectors separately from the default case.
There is already special handling for V4SFmode, so this leaves only
V2DFmode in the default case.  It was previously established heuristically
that a cost factor of 2 was appropriate for V2DFmode, so that is left
unchanged here; but since V2DFmode is the only default, we can simplify
the calculation to just return 2.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
Is this ok for trunk?

Thanks,
Bill


[gcc]

2017-05-11  Bill Schmidt  <wschm...@linux.vnet.ibm.com>

        PR target/80695
        * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost):
        Account for direct move costs for vec_construct of integer
        vectors.

[gcc/testsuite]

2017-05-11  Bill Schmidt  <wschm...@linux.vnet.ibm.com>

        PR target/80695
        * gcc.target/powerpc/pr80695-p8.c: New file.
        * gcc.target/powerpc/pr80695-p9.c: New file.


Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c  (revision 247809)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -5849,8 +5849,20 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
        if (SCALAR_FLOAT_TYPE_P (elem_type)
            && TYPE_PRECISION (elem_type) == 32)
          return 5;
+       /* On POWER9, integer vector types are built up in GPRs and then
+           use a direct move (2 cycles).  For POWER8 this is even worse,
+           as we need two direct moves and a merge, and the direct moves
+          are five cycles.  */
+       else if (INTEGRAL_TYPE_P (elem_type))
+         {
+           if (TARGET_P9_VECTOR)
+             return TYPE_VECTOR_SUBPARTS (vectype) - 1 + 2;
+           else
+             return TYPE_VECTOR_SUBPARTS (vectype) - 1 + 11;
+         }
        else
-         return max (2, TYPE_VECTOR_SUBPARTS (vectype) - 1);
+         /* V2DFmode doesn't need a direct move.  */
+         return 2;
 
       default:
         gcc_unreachable ();
Index: gcc/testsuite/gcc.target/powerpc/pr80695-p8.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr80695-p8.c       (nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr80695-p8.c       (working copy)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-options "-mcpu=power8 -O3 -fdump-tree-slp-details" } */
+
+/* PR80695: Verify cost model for vec_construct on POWER8.  */
+
+long a[10] __attribute__((aligned(16)));
+
+void foo (long i, long j, long k, long l)
+{
+  a[6] = i;
+  a[7] = j;
+  a[8] = k;
+  a[9] = l;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorization is not profitable" 1 
"slp2" } } */
Index: gcc/testsuite/gcc.target/powerpc/pr80695-p9.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr80695-p9.c       (nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr80695-p9.c       (working copy)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-options "-mcpu=power9 -O3 -fdump-tree-slp-details" } */
+
+/* PR80695: Verify cost model for vec_construct on POWER9.  */
+
+long a[10] __attribute__((aligned(16)));
+
+void foo (long i, long j, long k, long l)
+{
+  a[6] = i;
+  a[7] = j;
+  a[8] = k;
+  a[9] = l;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorization is not profitable" 1 
"slp2" } } */

Reply via email to