https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81274

Peter Cordes <peter at cordes dot ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |peter at cordes dot ca

--- Comment #1 from Peter Cordes <peter at cordes dot ca> ---
This LEA stuff is part of what gcc does to align the stack by 32 for spilling
AVX locals.

Gcc's stack-align sequence is over-complicated and ties up an extra register
for the whole function (add  volatile  to the local and see the -O3 code).  Or
at least it was; it seems gcc8 trunk just makes a stack frame with EBP / RBP
but references 32-byte aligned locals from aligned RSP instead of unaligned
RBP.

It used to copy the address of the return address to make a full copy of
ret-addr / saved-RBP for the aligned stack frame, which was super weird.

https://godbolt.org/g/RLJNtd.  (With an alloca or something, gcc8 does the same
crazy stack-frame stuff as gcc7, otherwise it's much cleaner, like clang)

----

The actual bug here is that it's not fully optimized away when it turns out
that no 32-byte spills / reloads from locals are left in the function.

gcc for x86-64 sometimes has a few leftover instructions like that in more
complex functions using __m256; this is not exclusively an i386 problem, but
it's happens more easily for 32-bit it seems.

Reply via email to