https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87414
Bug ID: 87414
Summary: -mindirect-branch=thunk produces thunk with incorrect
CFI on x86_64
Product: gcc
Version: 9.0
Status: UNCONFIRMED
Keywords: wrong-code
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: fw at gcc dot gnu.org
CC: hjl.tools at gmail dot com
Target Milestone: ---
Target: x86_64
GCC 9.0.0 (20180924) generates these thunks on x86-64:
__x86_indirect_thunk_rdi:
.LFB1:
.cfi_startproc
call .LIND1
.LIND0:
pause
lfence
jmp .LIND0
.LIND1:
mov %rdi, (%rsp)
ret
.cfi_endproc
.LFE1:
I don't think the CFI is correct. At the ret instruction, the CFI
indicates that the return address is at the top of the stack. The
unwinder will use this return address and subtract one because it's a
non-signal handler frame. But the resulting address is located before
the start of the function, so it will locate an incorrect FDE based on
it.
Indeed I see this when si-stepping through the execution with GDB:
(gdb) disas
Dump of assembler code for function __x86_indirect_thunk_rdi:
0x00000000004004a5 <+0>: callq 0x4004b1 <__x86_indirect_thunk_rdi+12>
0x00000000004004aa <+5>: pause
0x00000000004004ac <+7>: lfence
0x00000000004004af <+10>: jmp 0x4004aa <__x86_indirect_thunk_rdi+5>
0x00000000004004b1 <+12>: mov %rdi,(%rsp)
=> 0x00000000004004b5 <+16>: retq
0x00000000004004b6 <+17>: nopw %cs:0x0(%rax,%rax,1)
(gdb) bt
#0 0x00000000004004b5 in __x86_indirect_thunk_rdi ()
#1 0x0000000000400490 in frame_dummy () at /tmp/cfi.c:16
#2 0x000000000040038e in main () at /tmp/cfi.c:16
End of assembler dump.
(gdb) print f2
$1 = {int (void)} 0x400490 <f2>
Note the “frame_dummy” instead of “f2” in the backtrace.
Test program:
__attribute__ ((weak))
int
f1 (int (*f2) (void))
{
return f2 ();
}
int
f2 (void)
{
}
int
main (void)
{
f1 (f2);
}
We had a bit of an internal debate whether it's actually possible to produce
correct CFI for this. I think we can reflect the stack pointer adjustment
after the thunk-internal call in the CFI, so that the unwinder continues to see
the original caller of the thunk. Due to the address decrement, this needs to
happen for the jmp instruction, not after the .LIND1 label.
As an alternative, it would be possible to error out when
-mindirect-branch=thunk is used with -fasynchronous-unwind-tables, but since
the latter is the default, this would be a bit harsh.