The vectorizer cost model has a serious issue in not dealing well with targets using scalar stmt cost != 1. This is because it passes scalar iteration _cost_ to routines scaling that cost with the targets scalar stmt cost again. This is for example visible on x86_64 for all AMD archs which use high scalar stmt cost (6).
I am testing the following patch to fix that - for GCC 6 we might want to avoid the roundoff errors that can appear. Richard. 2015-02-10 Richard Biener <rguent...@suse.de> PR tree-optimization/64909 * tree-vect-loop.c (vect_estimate_min_profitable_iters): Properly pass a scalar-stmt count estimate to the cost model. * tree-vect-data-refs.c (vect_peeling_hash_get_lowest_cost): Likewise. * gcc.dg/vect/costmodel/x86_64/costmodel-pr64909.c: New testcase. Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c (revision 220540) +++ gcc/tree-vect-loop.c (working copy) @@ -2834,6 +2834,11 @@ vect_estimate_min_profitable_iters (loop statements. */ scalar_single_iter_cost = vect_get_single_scalar_iteration_cost (loop_vinfo); + /* ??? Below we use this cost as number of stmts with scalar_stmt cost, + thus divide by that. This introduces rounding errors, thus better + introduce a new cost kind (raw_cost? scalar_iter_cost?). */ + int scalar_single_iter_stmts + = scalar_single_iter_cost / vect_get_stmt_cost (scalar_stmt); /* Add additional cost for the peeled instructions in prologue and epilogue loop. @@ -2868,10 +2873,10 @@ vect_estimate_min_profitable_iters (loop /* FORNOW: Don't attempt to pass individual scalar instructions to the model; just assume linear cost for scalar iterations. */ (void) add_stmt_cost (target_cost_data, - peel_iters_prologue * scalar_single_iter_cost, + peel_iters_prologue * scalar_single_iter_stmts, scalar_stmt, NULL, 0, vect_prologue); (void) add_stmt_cost (target_cost_data, - peel_iters_epilogue * scalar_single_iter_cost, + peel_iters_epilogue * scalar_single_iter_stmts, scalar_stmt, NULL, 0, vect_epilogue); } else @@ -2887,7 +2892,7 @@ vect_estimate_min_profitable_iters (loop (void) vect_get_known_peeling_cost (loop_vinfo, peel_iters_prologue, &peel_iters_epilogue, - scalar_single_iter_cost, + scalar_single_iter_stmts, &prologue_cost_vec, &epilogue_cost_vec); Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c (revision 220540) +++ gcc/tree-vect-data-refs.c (working copy) @@ -1184,10 +1206,13 @@ vect_peeling_hash_get_lowest_cost (_vect } single_iter_cost = vect_get_single_scalar_iteration_cost (loop_vinfo); - outside_cost += vect_get_known_peeling_cost (loop_vinfo, elem->npeel, - &dummy, single_iter_cost, - &prologue_cost_vec, - &epilogue_cost_vec); + outside_cost += vect_get_known_peeling_cost + (loop_vinfo, elem->npeel, &dummy, + /* ??? We use this cost as number of stmts with scalar_stmt cost, + thus divide by that. This introduces rounding errors, thus better + introduce a new cost kind (raw_cost? scalar_iter_cost?). */ + single_iter_cost / vect_get_stmt_cost (scalar_stmt), + &prologue_cost_vec, &epilogue_cost_vec); /* Prologue and epilogue costs are added to the target model later. These costs depend only on the scalar iteration cost, the Index: gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr64909.c =================================================================== --- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr64909.c (revision 0) +++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr64909.c (working copy) @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-mtune=bdver1" } */ + +unsigned short a[32]; +unsigned int b[32]; +void t() +{ + int i; + for (i=0;i<12;i++) + b[i]=a[i]; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */