https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102840
--- Comment #3 from Roger Sayle <roger at nextmovesoftware dot com> --- With -m64, before: test: movq .LC1(%rip), %mm0 paddb .LC0(%rip), %mm0 movq %xmm0, x(%rip) ret And after: test: movq .LC2(%rip), %rax movq %rax, x(%rip) ret So we have two movq before, and two movq after, but clearly we've avoided the computation at run-time. It's difficult (for me) to judge whether the -m32's use of immediate constants is now better than -m64's load memory/store memory idiom in the "average case", but worst case [data cache miss], the former is clearly better [requiring only fewer memory transactions].