http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56165
--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> 2013-02-03 13:07:35 UTC --- (In reply to comment #11) > (In reply to comment #10) > > You're wrong. That is to maintain the ABI, which for x86_64 says that the > > stack is 16-byte aligned. Consider e.g. the noreturn function using SSE > > instructions, without that subq $8, %rsp the stack in the noreturn function > > would be not properly aligned to 16-bytes and any movdqa and similar insns > > on > > stack slots would crash. > > See carefully my compile keys (from first message): > x86_64-linux-gnu-gcc -c -Wall -Wno-attributes -save-temps -fverbose-asm > -masm=intel -march=core2 -mcmodel=large -mno-mmx -mno-sse -O1 -fno-rtti > -fno-default-inline -fomit-frame-pointer -falign-functions=16 > -foptimize-sibling-calls -ffreestanding -fno-stack-protector --no-exceptions > > There is: > -mno-mmx > -mno-sse > and in the long run > -fomit-frame-pointer > > I do not need your charge with your injected code at all. Please understand > me. That is completely irrelevant. The noreturn function is usually defined in some other CU, so you don't know what compiler flags it will be compiled with, and -mpreferred-stack-boundary=4 (i.e. 16 bytes alignment) is for x86_64 the smallest supported alignment. Even if you don't use SSE etc., the compiler is allowed and does assume the 16-byte alignment of stack pointer in many places. Note -mcmodel=large in your flags is much bigger slowdown than what you are complaining about here.