https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81274
Peter Cordes <peter at cordes dot ca> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |peter at cordes dot ca --- Comment #1 from Peter Cordes <peter at cordes dot ca> --- This LEA stuff is part of what gcc does to align the stack by 32 for spilling AVX locals. Gcc's stack-align sequence is over-complicated and ties up an extra register for the whole function (add volatile to the local and see the -O3 code). Or at least it was; it seems gcc8 trunk just makes a stack frame with EBP / RBP but references 32-byte aligned locals from aligned RSP instead of unaligned RBP. It used to copy the address of the return address to make a full copy of ret-addr / saved-RBP for the aligned stack frame, which was super weird. https://godbolt.org/g/RLJNtd. (With an alloca or something, gcc8 does the same crazy stack-frame stuff as gcc7, otherwise it's much cleaner, like clang) ---- The actual bug here is that it's not fully optimized away when it turns out that no 32-byte spills / reloads from locals are left in the function. gcc for x86-64 sometimes has a few leftover instructions like that in more complex functions using __m256; this is not exclusively an i386 problem, but it's happens more easily for 32-bit it seems.