On 9/19/17 12:38 PM, Bill Schmidt wrote: > Hi, > > https://gcc.gnu.org/PR82255 identifies a problem in the vector cost model > where a vectorized load is treated as having the cost of a strided load > in a case where we will not actually generate a strided load. This is > simply a mismatch between the conditions tested in the cost model and > those tested in the code that generates vectorized instructions. This > patch fixes the problem by recognizing when only a single non-strided > load will be generated and reporting the cost accordingly. > > I believe this patch is sufficient to catch all such cases, but I admit > that the code in vectorizable_load is complex enough that I could have > missed a trick. > > I've added a test in the PowerPC cost model subdirectory. Even though > this isn't a target-specific issue, the test does rely on a 16-byte > vector size, so this seems safest. > > Bootstrapped and tested on powerpc64le-linux-gnu with no regressions. > Is this ok for trunk?
After posting, I realized that I had wrongly recalculated stmt_info in the patch. Here's a new version (also passing regstrap) that removes that flaw. [gcc] 2017-09-19 Bill Schmidt <wschm...@linux.vnet.ibm.com> PR tree-optimization/82255 * tree-vect-stmts.c (vect_model_load_cost): Don't count vec_construct cost when a true strided load isn't present. [gcc/testsuite] 2017-09-19 Bill Schmidt <wschm...@linux.vnet.ibm.com> PR tree-optimization/82255 * gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c: New file. Index: gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c =================================================================== --- gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c (nonexistent) +++ gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c (working copy) @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ + +/* PR82255: Ensure we don't require a vec_construct cost when we aren't + going to generate a strided load. */ + +extern int abs (int __x) __attribute__ ((__nothrow__, __leaf__)) __attribute__ ((__const__)); + +static int +foo (unsigned char *w, int i, unsigned char *x, int j) +{ + int tot = 0; + for (int a = 0; a < 16; a++) + { + for (int b = 0; b < 16; b++) + tot += abs (w[b] - x[b]); + w += i; + x += j; + } + return tot; +} + +void +bar (unsigned char *w, unsigned char *x, int i, int *result) +{ + *result = foo (w, 16, x, i); +} + +/* { dg-final { scan-tree-dump-times "vec_construct required" 0 "vect" } } */ + Index: gcc/tree-vect-stmts.c =================================================================== --- gcc/tree-vect-stmts.c (revision 252760) +++ gcc/tree-vect-stmts.c (working copy) @@ -1091,8 +1091,19 @@ vect_model_load_cost (stmt_vec_info stmt_info, int prologue_cost_vec, body_cost_vec, true); if (memory_access_type == VMAT_ELEMENTWISE || memory_access_type == VMAT_STRIDED_SLP) - inside_cost += record_stmt_cost (body_cost_vec, ncopies, vec_construct, - stmt_info, 0, vect_body); + { + int group_size = GROUP_SIZE (stmt_info); + int nunits = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info)); + if (group_size < nunits) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_load_cost: vec_construct required"); + inside_cost += record_stmt_cost (body_cost_vec, ncopies, + vec_construct, stmt_info, 0, + vect_body); + } + } if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location,