[Bug target/84039] New: x86 retpolines and CFI

fw at gcc dot gnu.org Thu, 25 Jan 2018 06:13:22 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84039


            Bug ID: 84039
           Summary: x86 retpolines and CFI
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Keywords: wrong-code
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: fw at gcc dot gnu.org
  Target Milestone: ---
            Target: x86-64

I tried this:

struct C {
  virtual ~C();
  virtual void f();
};

void
f (C *p)
{
  p->f();
  p->f();
}

with r256939 and -mindirect-branch=thunk -O2 on x86-64 GNU/Linux, and got this:

_Z1fP1C:
.LFB0:
        .cfi_startproc
        pushq   %rbx
        .cfi_def_cfa_offset 16
        .cfi_offset 3, -16
        movq    (%rdi), %rax
        movq    %rdi, %rbx
        jmp     .LIND1
.LIND0:
        pushq   16(%rax)
        jmp     __x86_indirect_thunk
.LIND1:
        call    .LIND0
        movq    (%rbx), %rax
        movq    %rbx, %rdi
        popq    %rbx
        .cfi_def_cfa_offset 8
        movq    16(%rax), %rax
        jmp     __x86_indirect_thunk_rax
        .cfi_endproc

This doesn't look quite right.  x86-64 is supposed to have asynchronous unwind
tables by default, but there is nothing that reflects the change in the
(relative) frame address after .LIND0.  I think that region really has to be
moved outside of the .cfi_startproc/.cfi_endproc bracket.

There is a different issue with the think itself.

__x86_indirect_thunk_rax:
.LFB2:
        .cfi_startproc
        call    .LIND5
.LIND4:
        pause
        lfence
        jmp     .LIND4
.LIND5:
        mov     %rax, (%rsp)
        ret
        .cfi_endproc

If a signal is delivered after the mov has executed, the unwinder will
eventually unwind through the signal frame and hit __x86_indirect_thunk_rax. 
It does not treat it as a signal frame, so the return address of the stack is
decremented by one, in an attempt to obtain a program counter value which is
within the call instruction. However, in this scenario, the return address is
actually the start of the function, and subtracting one moves the program
counter out of the unwind region for that function.

It should be possible to fix the thunk function by changing the CFA offset
(using “.cfi_def_cfa_offset 16”) before the target address at the top of the
stack is overwritten, so that the unwinder never tries to unwind through a
function which has not yet started running (which is what causes the off-by-one
issue describe above).

Both issues are visible in GDB if you set breakpoints in the proper places
because the frame information used for debugging is incorrect as well.

Mailing list thread:

https://gcc.gnu.org/ml/gcc/2018-01/msg00160.html

This may impact the kernel after all.  We have a report of Systemtap not
working, which could be related:

https://sourceware.org/ml/systemtap/2018-q1/msg00008.html

[Bug target/84039] New: x86 retpolines and CFI

Reply via email to