https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84757
Bug ID: 84757 Summary: Useless MOVs and PUSHes to store results of MUL Product: gcc Version: 7.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: b7.10110111 at gmail dot com Target Milestone: --- Consider the following C code: #ifdef __SIZEOF_INT128__ typedef __uint128_t Longer; #else typedef unsigned long long Longer; #endif typedef unsigned long Shorter; Shorter mulSmarter(Shorter a, Shorter b, Shorter* upper) { const Longer ab=(Longer)a*b; *upper=ab >> 8*sizeof(Shorter); return ab; } On amd64 with -m64 option I get identical assembly on both gcc 7.x and 6.3. But on x86 (or amd64 with -m32) assembly is different, and on gcc 7.x is less efficient. See to compare: # gcc 6.3 mulSmarter: mov eax, DWORD PTR [esp+8] mul DWORD PTR [esp+4] mov ecx, edx mov edx, DWORD PTR [esp+12] mov DWORD PTR [edx], ecx ret # gcc 7.3 mulSmarter: push esi push ebx mov eax, DWORD PTR [esp+16] mul DWORD PTR [esp+12] mov esi, edx mov edx, DWORD PTR [esp+20] mov ebx, eax mov eax, ebx mov DWORD PTR [edx], esi pop ebx pop esi ret The gcc 6.3 version is already not perfect, but it's much better than that of 7.3.