The following fixes PR56118 by adjusting the cost model handling of basic-block vectorization to favor the vectorized version in case estimated cost is the same as the estimated cost of the scalar version. This makes sense because we over-estimate the vectorized cost in several places.
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2015-11-10 Richard Biener <rguent...@suse.de> PR tree-optimization/56118 * tree-vect-slp.c (vect_bb_vectorization_profitable_p): Make equal cost favor vectorized version. * gcc.target/i386/pr56118.c: New testcase. Index: gcc/tree-vect-slp.c =================================================================== --- gcc/tree-vect-slp.c (revision 230020) +++ gcc/tree-vect-slp.c (working copy) @@ -2317,9 +2317,12 @@ vect_bb_vectorization_profitable_p (bb_v dump_printf (MSG_NOTE, " Scalar cost of basic block: %d\n", scalar_cost); } - /* Vectorization is profitable if its cost is less than the cost of scalar - version. */ - if (vec_outside_cost + vec_inside_cost >= scalar_cost) + /* Vectorization is profitable if its cost is more than the cost of scalar + version. Note that we err on the vector side for equal cost because + the cost estimate is otherwise quite pessimistic (constant uses are + free on the scalar side but cost a load on the vector side for + example). */ + if (vec_outside_cost + vec_inside_cost > scalar_cost) return false; return true; Index: gcc/testsuite/gcc.target/i386/pr56118.c =================================================================== --- gcc/testsuite/gcc.target/i386/pr56118.c (revision 0) +++ gcc/testsuite/gcc.target/i386/pr56118.c (working copy) @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -msse2" } */ + +#include <emmintrin.h> + +__m128d f() +{ + __m128d r={3,4}; + r[0]=1; + r[1]=2; + return r; +} + +/* We want to "vectorize" this to a aligned vector load from the + constant pool. */ + +/* { dg-final { scan-assembler "movapd" } } */