https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120747
--- Comment #14 from Filip Kastl <pheeck at gcc dot gnu.org> --- If I do -fdump-tree-optimized, I see these two differences in function inl1100: A has higher numerical error (3.09998e+02)| B has ok numerical error (3.12012e+02) -------------------------------------------+----------------------------------------- _96 = vnb12_138 - vnb6_137; │ _96 = vnbtot_185 - vnb6_139; vnbtot_139 = _96 + vnbtot_183; │ vnbtot_141 = _96 + vnb12_140; and _57 = .FMS (vnb12_138, 1.2e+1, _56); │ _99 = .FMS (_56, _97, _58); _58 = .FMA (_54, _98, _57); │ _60 = .FMA (vnb12_140, 1.2e+1, _99); I'm not 100% sure, but I think that those are the only significant differences in inl1100. So in dump A we compute x = vnb12 - vnb6 + vnbtot z = FMA(a, b, FMS(vnb12, 1.2e+1, c)) = a * b + vnb12 * 1.2e+1 - c and in dump B we have x = vnbtot - vnb6 + vnb12 z = FMA(vnb12, 1.2e+1, FMS(a, b, c)) = vnb12 * 1.2e+1 + a * b - c So apparently based on range info GCC picks one of the two computations which are equivalent up to commutativity. Btw, the situation for function inl1120 is almost exactly the same.