https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116979
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu.org --- Comment #15 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Comparing just the slp2 lines with cost in them, I see with the patch -pr116979.c:8:10: note: vect_model_simple_cost: inside_cost = 12, prologue_cost = 0 -_8 * _10 1 times scalar_stmt costs 16 in body -_8 * _11 1 times scalar_stmt costs 16 in body -_8 * _10 1 times vector_stmt costs 16 in body -.VEC_ADDSUB (_12, _13) 1 times vector_stmt costs 12 in body +.VEC_FMADDSUB (_8, _10, _13) 1 times vector_stmt costs 12 in body - Vector cost: 172 - Scalar cost: 184 + Vector cost: 156 + Scalar cost: 152 So, I have really no idea what's going on. The scalar cost doesn't count 2 of the scalar multiplications for some reason and one vector multiplication (that makes sense, because .VEC_FMADDSUB replaces both one vector multiplication and .VEC_ADDSUB). With -fvect-cost-model=unlimited I can see all the scalar multiplications still around but all will be eventually dead: _12 = _25 * _29; _13 = _9 * _11; _14 = _25 * _30; _15 = _9 * _10; _16 = _12 - _13; _17 = _14 + _15; if (_44 unord _45) goto <bb 3>; [0.05%] else goto <bb 5>; [99.95%] <bb 5> [local count: 1073204960]: goto <bb 4>; [100.00%] <bb 3> [local count: 536864]: _18 = __mulsc3 (_25, _35, _29, _30); _19 = REALPART_EXPR <_18>; _20 = IMAGPART_EXPR <_18>; _46 = {_19, _20}; <bb 4> [local count: 1073741824]: # _21 = PHI <_16(5), _19(3)> # _22 = PHI <_17(5), _20(3)> # vect__21.21_47 = PHI <vect__3.20_43(5), _46(3)> MEM <vector(2) float> [(float *)&D.3124] = vect__21.21_47; where _21 and _22 are unused and so are _12 to _17.