https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102840

--- Comment #3 from Roger Sayle <roger at nextmovesoftware dot com> ---
With -m64, before:
test:   movq    .LC1(%rip), %mm0
        paddb   .LC0(%rip), %mm0
        movq    %xmm0, x(%rip)
        ret

And after:
test:   movq    .LC2(%rip), %rax
        movq    %rax, x(%rip)
        ret

So we have two movq before, and two movq after, but clearly we've avoided the
computation at run-time.

It's difficult (for me) to judge whether the -m32's use of immediate constants
is now better than -m64's load memory/store memory idiom in the "average case",
but worst case [data cache miss], the former is clearly better [requiring only
fewer memory transactions].

Reply via email to