Hello all,
I am trying to analyze the optimized results of following code. The
intent is to unpack a 64-bit integer into a struct containing eight
8-bit integers. The optimized result was very promising at first, but
I then discovered that whenever the unpacking function gets inlined
into another function, the optimization no longer works.
/* a struct of eight 8-bit integers */
struct alpha {
int8_t a;
int8_t b;
...
int8_t h;
};
struct alpha unpack(uint64_t x)
{
struct alpha r;
memcpy(&r, &x, 8);
return r;
}
struct alpha wrapper(uint64_t y)
{
return unpack(y);
}
The code was compiled with gcc 5.3.0 on Linux 4.4.1 with -O3 on x86-64.
The `unpack` function optimizes fine. It produces the following
assembly as expected:
mov rax, rdi
ret
Given that `wrapper` is a trivial wrapper around `unpack`, I would
expect the same. But in reality this is what I got from gcc:
mov eax, edi
xor ecx, ecx
mov esi, edi
shr ax, 8
mov cl, dil
shr esi, 24
mov ch, al
mov rax, rdi
movzx edx, sil
and eax, 16711680
and rcx, -16711681
sal rdx, 24
movabs rsi, -4278190081
or rcx, rax
mov rax, rcx
movabs rcx, -1095216660481
and rax, rsi
or rax, rdx
movabs rdx, 1095216660480
and rdx, rdi
and rax, rcx
movabs rcx, -280375465082881
or rax, rdx
movabs rdx, 280375465082880
and rdx, rdi
and rax, rcx
movabs rcx, -71776119061217281
or rax, rdx
movabs rdx, 71776119061217280
and rdx, rdi
and rax, rcx
shr rdi, 56
or rax, rdx
sal rdi, 56
movabs rdx, 72057594037927935
and rax, rdx
or rax, rdi
ret
This seems quite strange. Somehow the inlining process seems to have
screwed up the potential optimizations. Is there a someway to prevent
this from happening short of disabling inlining? Or perhaps there is
a better way to write this code so that gcc would optimize more
predictably?
I would appreciate any advice, thanks.
Phil