Re: How to efficiently unpack 8 bytes from a 64-bit integer?

Richard Biener Fri, 19 Feb 2016 01:53:50 -0800

On Fri, Feb 19, 2016 at 10:44 AM, Phil Ruffwind <r...@rufflewind.com> wrote:
> I tried to look for a workaround for this.  It seemed that using a
> union instead of memcpy was enough to convince GCC to optimize into a
> single "mov".
>
>     struct alpha unpack(uint64_t x)
>     {
>         union {
>             struct alpha r;
>             uint64_t i;
>         } u;
>         u.i = x;
>         return u.r;
>     }
>
> But that trick turned out to be short-lived.  If I wrap the wrapper
> with another function:
>
>     struct alpha wrapperwrapper(uint64_t y)
>     {
>         return wrapper(y);
>     }
>
> I get the same 37-line assembly generated for this function.  What's
> even more strange is that if I just define two identical wrappers in
> the same translation unit:
>
>     struct alpha wrapper(uint64_t y)
>     {
>         return unpack(y);
>     }
>
>     struct alpha wrapper2(uint64_t y)
>     {
>         return unpack(y);
>     }
>
> One of them gets optimized perfectly, while the other fails, even
> though the bodies of the two functions are completely identical!


Yes, as said GCC tries to optimize the copy that results from copying
the return value aggregate to the caller return value slot.  GCC hopes
for followup optimization opportunities here but obviously there are none
in this case.

Can you please open a bugreport?  We eventually can tweak SRA
heuristics in some way here.  Note that you only get good code because
the aggregate is passed and returned in a register (and thus "alignment"
doesn't matter here) - something which is exposed too late to GCC
to make use of that fact in SRA (well, easily at least).

Richard.

Re: How to efficiently unpack 8 bytes from a 64-bit integer?

Reply via email to