https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582
--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Just trying a dumb microbenchmark:
struct S { unsigned long a, b; } s;
__attribute__((noipa)) void
foo (unsigned long a, unsigned long b)
{
s.a = a;
s.b = b;
}
int
main ()
{
int i;
for (i = 0; i < 1000000000; i++)
foo (42, 43);
return 0;
}
the GCC 11 vs. GCC 12 code:
- movq %rdi, s(%rip)
- movq %rsi, s+8(%rip)
+ movq %rdi, %xmm0
+ movq %rsi, %xmm1
+ punpcklqdq %xmm1, %xmm0
+ movaps %xmm0, s(%rip)
seems to be exactly the same speed (on i9-7960X) and the GCC 11 code is 7 bytes
smaller.