https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78164
Bug ID: 78164 Summary: SLP vectorizer: prologue cost biased by redundancies Product: gcc Version: 7.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: glisse at gcc dot gnu.org Target Milestone: --- From http://stackoverflow.com/q/39947582/1918193 void testfunc_flat(double a, double b, double* dst) { dst[0] = 0.1 + ( a)*(1.0 + 0.5*( a)); dst[1] = 0.1 + ( b)*(1.0 + 0.5*( b)); dst[2] = 0.1 + (-a)*(1.0 + 0.5*(-a)); dst[3] = 0.1 + (-b)*(1.0 + 0.5*(-b)); } We fail to vectorize with AVX, that's understandable because the operations are different. More surprising is that we reject SSE vectorization Vector inside of basic block cost: 14 Vector prologue cost: 10 Vector epilogue cost: 0 Scalar cost of basic block: 22 However, if I disable the cost model, I can see this prologue that is supposed to have cost 10: vect_cst__47 = { 1.000000000000000055511151231257827021181583404541015625e-1, 1.000000000000000055511151231257827021181583404541015625e-1 }; vect_cst__44 = { 1.0e+0, 1.0e+0 }; vect_cst__42 = { 5.0e-1, 5.0e-1 }; vect_cst__40 = {a_19(D), b_23(D)}; vect_cst__38 = {a_19(D), b_23(D)}; vect_cst__34 = { 1.000000000000000055511151231257827021181583404541015625e-1, 1.000000000000000055511151231257827021181583404541015625e-1 }; vect_cst__32 = {a_19(D), b_23(D)}; vect_cst__30 = { 1.0e+0, 1.0e+0 }; vect_cst__28 = { 5.0e-1, 5.0e-1 }; vect_cst__27 = {a_19(D), b_23(D)}; Some very basic CSE would bring it down to a cost of 4 and allow vectorizing like llvm.