https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111896
Bug ID: 111896 Summary: call with wrong stack alignment Product: gcc Version: 9.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: lukas.gra...@tu-darmstadt.de Target Milestone: --- Created attachment 56157 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56157&action=edit ccnhCTdD.i.tmp.i For some reason, I manged to get a SEGV when running a program. I spent time debugging it, and found out that the problem was when executing: movaps %xmm0,0x40(%rsp) It took me some time, but I realized the SEGV was caused by the rsp pointer 8 bytes off. It should be aligned to 16 bytes. So wrong alignment. I also found out where the misalignment happend. See the attached file. dlist_free_original() is calling freeit(). This is compiled as dlist_free_original.constprop.0 calling do_line() as follows: dlist_free_original.constprop.0: ... pushq %rbp ... pushq %rbx ... call do_line So the stack is misaligned when the call happens. It might be because do_line() is written in inline asm with __attribute__((naked)). Starting with gcc 11.3, there seems to be an extra "sub rsp,8" which seems to solve this. But I was using gcc 9.4.0 (shipped with ubuntu 20.04) on amd64 linux. A quick check on godbolt showed me that misalignment still happen in gcc 11.2. So I am unsure if this is still relevant but I am reporting just in case. gcc -O3 -c -S ccnhCTdD.i.tmp.i -o tmp.s If you need the full executable or anything else, ask me. Background: I wanted to have a way to record which functions where called through a pointer. For that, I created a wrapper for every function, renaming the original function to ..._original. I also created a macro renaming direct calls to _original so that only calls through a pointer were left. The wrapper functions are doing their logging (it takes only a few instructions) and then sibcall to the respective original function. A wrapper for vararg functions seemed to be only possible using asm, so I used asm. Since the other functions might be static, I had to do inline asm with attribute naked.