I tried to look for a workaround for this. It seemed that using a
union instead of memcpy was enough to convince GCC to optimize into a
single "mov".
struct alpha unpack(uint64_t x)
{
union {
struct alpha r;
uint64_t i;
} u;
u.i = x;
return u.r;
}
But that trick turned out to be short-lived. If I wrap the wrapper
with another function:
struct alpha wrapperwrapper(uint64_t y)
{
return wrapper(y);
}
I get the same 37-line assembly generated for this function. What's
even more strange is that if I just define two identical wrappers in
the same translation unit:
struct alpha wrapper(uint64_t y)
{
return unpack(y);
}
struct alpha wrapper2(uint64_t y)
{
return unpack(y);
}
One of them gets optimized perfectly, while the other fails, even
though the bodies of the two functions are completely identical!