https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77438
--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> --- Gcc does avoid using the stack when it is more efficient to do so (depends on the -march setting). Yes, using SSE would be better. The general advice is to stop using MMX. Using gcc's vector extension generates even worse code in this case: _8 = VIEW_CONVERT_EXPR<long unsigned int>(x_4(D)); _9 = VIEW_CONVERT_EXPR<long unsigned int>(y_5(D)); _10 = _8 ^ _9; _11 = _9 & 9187201950435737471; _12 = _8 & 9187201950435737471; _13 = _10 & 9259542123273814144; _14 = _11 + _12; _15 = _13 ^ _14; _6 = VIEW_CONVERT_EXPR<__m64>(_15); (I think there is another PR asking that vector lowering allow the use of larger vector types)