On Mon, 2017-04-03 at 23:15 +0200, Mark Wielaard wrote: > On Mon, Apr 03, 2017 at 11:02:53AM +0200, Ulf Hermann wrote: > > > Ping? Any progress on merging this functionality upstream? > > > It can make quite a difference in unwinding. > > > > The patches have also been in perfparser releases for over a year now. I > > would like to see them upstream. > > Yes, sorry. The patches looked fine. Then I thought I should try > adding similar support to another arch to double check. And then > I got distracted by something else. I'll get back to is asap and > make sure they get in before the next release end of the month.
OK. Sorry for the delay. In general I like these patches. I even made a i386 fp unwind backend function to play with the idea a bit more. See attached. Does that look sane? I do agree with Jan that frame pointer unwinding is notoriously untrustworthy. Even with some sanity checks it is hard to know whether you are doing a correct unwind. gdb gets away with it through pretty advanced frame sniffers, which take a lot of low-level compiler generation knowledge to get correct (and even then...). You only restore the pc, stack and frame pointer registers (maybe incorrectly). So afterwards, even if you got valid CFI data you might be stuck. Ideally we provide the user with a flag to check how trustworthy a frame is and/or an option to drop inexact unwinding completely. But that would be an extension for another time. I do think that in the context of dwfl_thread_getframes () it is appropriate. Currently all you can get from a Dwfl_Frame is the pc anyway. I do however have a problem with the testcases. I approved the --allow-unknown option for the backtrace fp core tests (commit f9971cb) but I now believe that was a mistake. It makes it really hard to see whether the test actually tests anything. In fact for my i386 case the testcase even succeeded when disabling the whole frame pointer unwinder fallback... I believe the issue with the missing symbol names is not caused by the frame pointer unwinding not matching real function addresses, because they should! But because of the way the testcase binaries are generated. What you do is the following: # gcc -static -O2 -fno-omit-frame-pointer -I ../ -I ../lib/ -D_GNU_SOURCE # -pthread -o backtrace.x86_64.fp.exec backtrace-child.c # # Then strip the .eh_frame and .eh_frame_hdr sections from the binary: # # strip -R .eh_frame backtrace.x86_64.fp.exec # strip -R .eh_frame_hdr backtrace.x86_64.fp.exec # # The core is generated by calling the binary with --gencore But the .eh_frame sections are allocated sections. Which means that by removing them strip will alter the address layout of the program. Which causes the symbol table addresses to not match up anymore. I think they should be generated with -fno-asynchronous-unwind-tables to avoid them being generated in the first place (in the main binary functions, but keep those from the static library functions pulled in): # gcc -static -O2 -fno-omit-frame-pointer -fno-asynchronous-unwind-tables \ # -D_GNU_SOURCE -pthread -o tests/backtrace.i386.fp.exec -I. -Ilib \ # tests/backtrace-child.c That way the difference between having a frame pointer unwinder or not is much clearer: $ eu-stack --exec ./backtrace.i386.fp.exec --core ./backtrace.i386.fp.core PID 12044 - core TID 12045: #0 0x008a7416 __kernel_vsyscall #1 0x08051ab9 raise #2 0x080485c1 sigusr2 eu-stack: dwfl_thread_getframes tid 12045 at 0x80485c0 in [exe]: No DWARF information found TID 12044: #0 0x008a7416 __kernel_vsyscall #1 0x0804ae30 pthread_join #2 0x0804847c main eu-stack: dwfl_thread_getframes tid 12044 at 0x804847b in [exe]: No DWARF information found $ LD_LIBRARY_PATH=backends:libelf:libdw src/stack --exec ./backtrace.i386.fp.exec --core ./backtrace.i386.fp.core PID 12044 - core TID 12045: #0 0x008a7416 __kernel_vsyscall #1 0x08051ab9 raise #2 0x080485c1 sigusr2 #3 0x08048699 stdarg #4 0x08048702 backtracegen #5 0x0804871b start #6 0x0804a7cf start_thread #7 0x080746fe __clone TID 12044: #0 0x008a7416 __kernel_vsyscall #1 0x0804ae30 pthread_join #2 0x0804847c main #3 0x08053188 __libc_start_main #4 0x080481e1 _start src/stack: dwfl_thread_getframes tid 12044 at 0x80481e0 in [exe]: No DWARF information found Then if we check for one of the function names in the middle of the frame pointer only unwound frames, like backtracegen, this would give us a much better test of whether things worked correctly. Could you try regenerating your test binaries with -fno-asynchronous-unwind-tables? Then I'll try to adapt the testsuite to check for the one of the frame pointer only unwound function names. Thanks, Mark
/* Get previous frame state for an existing frame state using frame pointers. Copyright (C) 2017 Red Hat, Inc. This file is part of elfutils. This file is free software; you can redistribute it and/or modify it under the terms of either * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version or both in parallel, as here. elfutils is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with this program. If not, see <http://www.gnu.org/licenses/>. */ #ifdef HAVE_CONFIG_H # include <config.h> #endif #include <stdlib.h> #include <assert.h> #define BACKEND i386_ #include "libebl_CPU.h" /* Register numbers for frame and stack pointers. We take advantage of them being next to each other when calling getfunc and setfunc. */ #define ESP 4 #define EBP (ESP + 1) /* Most basic frame pointer chasing with EBP as frame pointer. PC = *(FP + 4), SP = FP + 8, FP = *FP. */ bool i386_unwind (Ebl *ebl __attribute__ ((unused)), Dwarf_Addr pc __attribute__ ((unused)), ebl_tid_registers_t *setfunc, ebl_tid_registers_get_t *getfunc, ebl_pid_memory_read_t *readfunc, void *arg, bool *signal_framep __attribute__ ((unused))) { /* sp = 0, fp = 1 */ Dwarf_Word regs[2]; /* Get current stack and frame pointers. */ if (! getfunc (ESP, 2, regs, arg)) return false; Dwarf_Word sp = regs[0]; Dwarf_Word fp = regs[1]; /* Sanity check. We only support traditional stack frames. */ if (fp == 0 || sp == 0 || fp < sp) return false; /* Get the return address from the stack, it is our new pc. */ Dwarf_Word ret_addr; if (! readfunc (fp + 4, &ret_addr, arg) || ret_addr == 0) return false; /* Get new sp and fp. */ sp = fp + 8; if (! readfunc (fp, &fp, arg) || fp == 0 || sp >= fp) return false; /* Set new sp, fp and pc. */ regs[0] = sp; regs[1] = fp; if (! setfunc (ESP, 2, regs, arg) || ! setfunc (-1, 1, &ret_addr, arg)) return false; return true; }