On Fri, Feb 19, 2016 at 7:24 AM, Phil Ruffwind <r...@rufflewind.com> wrote: > Hello all, > > I am trying to analyze the optimized results of following code. The > intent is to unpack a 64-bit integer into a struct containing eight > 8-bit integers. The optimized result was very promising at first, but > I then discovered that whenever the unpacking function gets inlined > into another function, the optimization no longer works. > > /* a struct of eight 8-bit integers */ > struct alpha { > int8_t a; > int8_t b; > ... > int8_t h; > }; > > struct alpha unpack(uint64_t x) > { > struct alpha r; > memcpy(&r, &x, 8); > return r; > } > > struct alpha wrapper(uint64_t y) > { > return unpack(y); > } > > The code was compiled with gcc 5.3.0 on Linux 4.4.1 with -O3 on x86-64. > > The `unpack` function optimizes fine. It produces the following > assembly as expected: > > mov rax, rdi > ret > > Given that `wrapper` is a trivial wrapper around `unpack`, I would > expect the same. But in reality this is what I got from gcc: > > mov eax, edi > xor ecx, ecx > mov esi, edi > shr ax, 8 > mov cl, dil > shr esi, 24 > mov ch, al > mov rax, rdi > movzx edx, sil > and eax, 16711680 > and rcx, -16711681 > sal rdx, 24 > movabs rsi, -4278190081 > or rcx, rax > mov rax, rcx > movabs rcx, -1095216660481 > and rax, rsi > or rax, rdx > movabs rdx, 1095216660480 > and rdx, rdi > and rax, rcx > movabs rcx, -280375465082881 > or rax, rdx > movabs rdx, 280375465082880 > and rdx, rdi > and rax, rcx > movabs rcx, -71776119061217281 > or rax, rdx > movabs rdx, 71776119061217280 > and rdx, rdi > and rax, rcx > shr rdi, 56 > or rax, rdx > sal rdi, 56 > movabs rdx, 72057594037927935 > and rax, rdx > or rax, rdi > ret > > This seems quite strange. Somehow the inlining process seems to have > screwed up the potential optimizations. Is there a someway to prevent > this from happening short of disabling inlining? Or perhaps there is > a better way to write this code so that gcc would optimize more > predictably?
It seems to be SRA "optimizing" the copy it sees in unpack () unpack (uint64_t x) { struct alpha r; struct alpha D.2276; long unsigned int _2; <bb 2>: _2 = x_6(D); MEM[(char * {ref-all})&r] = x_6(D); D.2276 = r; <-- this one r ={v} {CLOBBER}; return D.2276; when inlined into wrapper: wrapper (uint64_t y) { struct alpha D.2286; struct alpha r; struct alpha D.2279; <bb 2>: MEM[(char * {ref-all})&r] = y_2(D); D.2286 = r; r ={v} {CLOBBER}; D.2279 = D.2286; return D.2279; while this results in removing the redundant aggregate D.2286 it also results in implementing the copy byte-wise (for no good reason). Note that it cannot simply use bigger accesses as struct alpha is only aligned to 1 byte. Richard. > I would appreciate any advice, thanks. > > Phil