Hi Bill, On Wed, May 03, 2017 at 02:43:09PM -0500, Bill Schmidt wrote: > We recently became aware of some poor code generation as a result of > unprofitable (for POWER) loop vectorization. When a loop is simply copying > data with 64-bit loads and stores, vectorizing with 128-bit loads and stores > generally does not provide any benefit on modern POWER processors. > Furthermore, if there is a requirement to version the loop for aliasing, > alignment, etc., the cost of the versioning test is almost certainly a > performance loss for such loops. The user code example included such a copy > loop, executed only a few times on average, within an outer loop that was > executed many times on average, causing a tremendous slowdown. > > This patch very specifically targets these kinds of loops and no others, > and artificially inflates the vectorization cost to ensure vectorization > does not appear profitable. This is done within the target model cost > hooks to avoid affecting other targets. A new test case is included that > demonstrates the refusal to vectorize. > > We've done SPEC performance testing to verify that the patch does not > degrade such workloads. Results were all in the noise range. The > customer code performance loss was verified to have been reversed. > > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. > Is this ok for trunk?
> 2017-05-03 Bill Schmidt <wschm...@linux.vnet.ibm.com> > > * config/rs6000/rs6000.c (rs6000_vect_nonmem): New static var. > (rs6000_init_cost): Initialize rs6000_vect_nonmem. > (rs6000_add_stmt_cost): Update rs6000_vect_nonmem. > (rs6000_finish_cost): Avoid vectorizing simple copy loops with > VF=2 that require versioning. > > [gcc/testsuite] > > 2017-05-03 Bill Schmidt <wschm...@linux.vnet.ibm.com> > > * gcc.target/powerpc/veresioned-copy-loop.c: New file. > --- gcc/config/rs6000/rs6000.c (revision 247560) > +++ gcc/config/rs6000/rs6000.c (working copy) > @@ -5873,6 +5873,8 @@ rs6000_density_test (rs6000_cost_data *data) > > /* Implement targetm.vectorize.init_cost. */ > > +static bool rs6000_vect_nonmem; Please put a comment on this, saying what it is for. > + /* Check whether we're doing something other than just a copy loop. > + Not all such loops may be profitably vectorized; see > + rs6000_finish_cost. */ > + if ((where == vect_body > + && (kind == vector_stmt || kind == vec_to_scalar || kind == vec_perm > + || kind == vec_promote_demote || kind == vec_construct > + || kind == scalar_to_vec)) > + || (where != vect_body > + && (kind == vec_to_scalar || kind == vec_perm > + || kind == vec_promote_demote || kind == vec_construct > + || kind == scalar_to_vec))) > + rs6000_vect_nonmem = true; Perhaps + if ((kind == vec_to_scalar || kind == vec_perm + || kind == vec_promote_demote || kind == vec_construct + || kind == scalar_to_vec) + || (where == vect_body && kind == vector_stmt)) > + rs6000_vect_nonmem = true; if you agree that is clearer. Okay for trunk with the comment added, and the condition either or not simplified. Thanks, Segher