On Fri, Feb 19, 2016 at 7:24 AM, Phil Ruffwind <[email protected]> wrote:
> Hello all,
>
> I am trying to analyze the optimized results of following code. The
> intent is to unpack a 64-bit integer into a struct containing eight
> 8-bit integers. The optimized result was very promising at first, but
> I then discovered that whenever the unpacking function gets inlined
> into another function, the optimization no longer works.
>
> /* a struct of eight 8-bit integers */
> struct alpha {
> int8_t a;
> int8_t b;
> ...
> int8_t h;
> };
>
> struct alpha unpack(uint64_t x)
> {
> struct alpha r;
> memcpy(&r, &x, 8);
> return r;
> }
>
> struct alpha wrapper(uint64_t y)
> {
> return unpack(y);
> }
>
> The code was compiled with gcc 5.3.0 on Linux 4.4.1 with -O3 on x86-64.
>
> The `unpack` function optimizes fine. It produces the following
> assembly as expected:
>
> mov rax, rdi
> ret
>
> Given that `wrapper` is a trivial wrapper around `unpack`, I would
> expect the same. But in reality this is what I got from gcc:
>
> mov eax, edi
> xor ecx, ecx
> mov esi, edi
> shr ax, 8
> mov cl, dil
> shr esi, 24
> mov ch, al
> mov rax, rdi
> movzx edx, sil
> and eax, 16711680
> and rcx, -16711681
> sal rdx, 24
> movabs rsi, -4278190081
> or rcx, rax
> mov rax, rcx
> movabs rcx, -1095216660481
> and rax, rsi
> or rax, rdx
> movabs rdx, 1095216660480
> and rdx, rdi
> and rax, rcx
> movabs rcx, -280375465082881
> or rax, rdx
> movabs rdx, 280375465082880
> and rdx, rdi
> and rax, rcx
> movabs rcx, -71776119061217281
> or rax, rdx
> movabs rdx, 71776119061217280
> and rdx, rdi
> and rax, rcx
> shr rdi, 56
> or rax, rdx
> sal rdi, 56
> movabs rdx, 72057594037927935
> and rax, rdx
> or rax, rdi
> ret
>
> This seems quite strange. Somehow the inlining process seems to have
> screwed up the potential optimizations. Is there a someway to prevent
> this from happening short of disabling inlining? Or perhaps there is
> a better way to write this code so that gcc would optimize more
> predictably?
It seems to be SRA "optimizing" the copy it sees in unpack ()
unpack (uint64_t x)
{
struct alpha r;
struct alpha D.2276;
long unsigned int _2;
<bb 2>:
_2 = x_6(D);
MEM[(char * {ref-all})&r] = x_6(D);
D.2276 = r; <-- this one
r ={v} {CLOBBER};
return D.2276;
when inlined into wrapper:
wrapper (uint64_t y)
{
struct alpha D.2286;
struct alpha r;
struct alpha D.2279;
<bb 2>:
MEM[(char * {ref-all})&r] = y_2(D);
D.2286 = r;
r ={v} {CLOBBER};
D.2279 = D.2286;
return D.2279;
while this results in removing the redundant aggregate D.2286 it
also results in implementing the copy byte-wise (for no good reason).
Note that it cannot simply use bigger accesses as struct alpha is
only aligned to 1 byte.
Richard.
> I would appreciate any advice, thanks.
>
> Phil