How to efficiently unpack 8 bytes from a 64-bit integer?

2016-02-18 Thread Phil Ruffwind
Hello all,

I am trying to analyze the optimized results of following code.  The
intent is to unpack a 64-bit integer into a struct containing eight
8-bit integers.  The optimized result was very promising at first, but
I then discovered that whenever the unpacking function gets inlined
into another function, the optimization no longer works.

/* a struct of eight 8-bit integers */
struct alpha {
int8_t a;
int8_t b;
...
int8_t h;
};

struct alpha unpack(uint64_t x)
{
struct alpha r;
memcpy(&r, &x, 8);
return r;
}

struct alpha wrapper(uint64_t y)
{
return unpack(y);
}

The code was compiled with gcc 5.3.0 on Linux 4.4.1 with -O3 on x86-64.

The `unpack` function optimizes fine.  It produces the following
assembly as expected:

mov rax, rdi
ret

Given that `wrapper` is a trivial wrapper around `unpack`, I would
expect the same.  But in reality this is what I got from gcc:

mov eax, edi
xor ecx, ecx
mov esi, edi
shr ax, 8
mov cl, dil
shr esi, 24
mov ch, al
mov rax, rdi
movzx edx, sil
and eax, 16711680
and rcx, -16711681
sal rdx, 24
movabs rsi, -4278190081
or rcx, rax
mov rax, rcx
movabs rcx, -1095216660481
and rax, rsi
or rax, rdx
movabs rdx, 1095216660480
and rdx, rdi
and rax, rcx
movabs rcx, -280375465082881
or rax, rdx
movabs rdx, 280375465082880
and rdx, rdi
and rax, rcx
movabs rcx, -71776119061217281
or rax, rdx
movabs rdx, 71776119061217280
and rdx, rdi
and rax, rcx
shr rdi, 56
or rax, rdx
sal rdi, 56
movabs rdx, 72057594037927935
and rax, rdx
or rax, rdi
ret

This seems quite strange.  Somehow the inlining process seems to have
screwed up the potential optimizations.  Is there a someway to prevent
this from happening short of disabling inlining?  Or perhaps there is
a better way to write this code so that gcc would optimize more
predictably?

I would appreciate any advice, thanks.

Phil


Re: How to efficiently unpack 8 bytes from a 64-bit integer?

2016-02-19 Thread Phil Ruffwind
I tried to look for a workaround for this.  It seemed that using a
union instead of memcpy was enough to convince GCC to optimize into a
single "mov".

struct alpha unpack(uint64_t x)
{
union {
struct alpha r;
uint64_t i;
} u;
u.i = x;
return u.r;
}

But that trick turned out to be short-lived.  If I wrap the wrapper
with another function:

struct alpha wrapperwrapper(uint64_t y)
{
return wrapper(y);
}

I get the same 37-line assembly generated for this function.  What's
even more strange is that if I just define two identical wrappers in
the same translation unit:

struct alpha wrapper(uint64_t y)
{
return unpack(y);
}

struct alpha wrapper2(uint64_t y)
{
return unpack(y);
}

One of them gets optimized perfectly, while the other fails, even
though the bodies of the two functions are completely identical!


Re: How to efficiently unpack 8 bytes from a 64-bit integer?

2016-02-19 Thread Phil Ruffwind
> Can you please open a bugreport?

Done: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69871