https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81274
Peter Cordes <peter at cordes dot ca> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |peter at cordes dot ca
--- Comment #1 from Peter Cordes <peter at cordes dot ca> ---
This LEA stuff is part of what gcc does to align the stack by 32 for spilling
AVX locals.
Gcc's stack-align sequence is over-complicated and ties up an extra register
for the whole function (add volatile to the local and see the -O3 code). Or
at least it was; it seems gcc8 trunk just makes a stack frame with EBP / RBP
but references 32-byte aligned locals from aligned RSP instead of unaligned
RBP.
It used to copy the address of the return address to make a full copy of
ret-addr / saved-RBP for the aligned stack frame, which was super weird.
https://godbolt.org/g/RLJNtd. (With an alloca or something, gcc8 does the same
crazy stack-frame stuff as gcc7, otherwise it's much cleaner, like clang)
----
The actual bug here is that it's not fully optimized away when it turns out
that no 32-byte spills / reloads from locals are left in the function.
gcc for x86-64 sometimes has a few leftover instructions like that in more
complex functions using __m256; this is not exclusively an i386 problem, but
it's happens more easily for 32-bit it seems.