[Bug libdw/30272] New: Unwinding multithreaded musl applications fails
https://sourceware.org/bugzilla/show_bug.cgi?id=30272 Bug ID: 30272 Summary: Unwinding multithreaded musl applications fails Product: elfutils Version: unspecified Status: UNCONFIRMED Severity: normal Priority: P2 Component: libdw Assignee: unassigned at sourceware dot org Reporter: godlygeek at gmail dot com CC: elfutils-devel at sourceware dot org Target Milestone: --- Unwinding multithreaded applications linked against musl libc on x86-64 seems to fail, getting stuck on `__clone`: TID 241: ... #20 0x7f6f2f74f08b start #21 0x7f6f2f75138e __clone #22 0x7f6f2f75138e __clone #23 0x7f6f2f75138e __clone ... #253 0x7f6f2f75138e __clone #254 0x7f6f2f75138e __clone #255 0x7f6f2f75138e __clone eu-stack: tid 241: shown max number of frames (256, use -n 0 for unlimited) GDB seems to detect the condition that libdw is getting stuck on, emitting a warning message: #44 0x7f8f83e4d08b in start (p=0x7f8f836b8b00) at src/thread/pthread_create.c:203 #45 0x7f8f83e4f38e in __clone () at src/thread/x86_64/clone.s:22 Backtrace stopped: frame did not save the PC I believe it's detecting that two frames in a row have the same DWARF CFA, if I understand correctly. Reproducer: docker run -it --privileged python:3.10-alpine sh And in the container: apk add --update musl-dbg elfutils python3.10 -c "import os, threading; threading.Thread(target=lambda: os.system(f'eu-stack --pid={os.getpid()}')).start()" That spawns a thread that forks a subprocess that runs `eu-stack` on its parent, and reproduces the issue. If you remove the thread and just run: python3.10 -c "import os; os.system(f'eu-stack --pid={os.getpid()}')" then unwinding succeeds, ending at `_start`. -- You are receiving this mail because: You are on the CC list for the bug.
[Bug libdw/30272] Unwinding multithreaded musl applications fails
https://sourceware.org/bugzilla/show_bug.cgi?id=30272 --- Comment #1 from Matt Wozniski --- I encountered this issue using `dwfl_getthread_frames`, and I've found that calling `dwfl_frame_reg` to check if the stack pointer register was the same for two frames in a row and breaking out if so seems to work around it. I'm not sure if that's entirely correct, though. Are there any legitimate cases where two different frames passed to the callback would have the same stack pointer? My impression is that the stack pointer should change for every function call because the return address is stored on the stack, but perhaps there are some architectures where that isn't the case... -- You are receiving this mail because: You are on the CC list for the bug.
[Bug libdw/29430] New: `dwarf_getscopes` fails after a8493c1
https://sourceware.org/bugzilla/show_bug.cgi?id=29430 Bug ID: 29430 Summary: `dwarf_getscopes` fails after a8493c1 Product: elfutils Version: unspecified Status: UNCONFIRMED Severity: normal Priority: P2 Component: libdw Assignee: unassigned at sourceware dot org Reporter: godlygeek at gmail dot com CC: elfutils-devel at sourceware dot org Target Milestone: --- Apologies, but I haven't yet succeeded in creating a self-contained reproducer for this issue. When calling `dwarf_getscopes` on a (PGO and LTO) binary (a Python interpreter built with GCC 9.3.1 against glibc 2.12, which is a relatively old glibc version), I'm seeing failures with elfutils 0.187 that I didn't see with elfutils 0.179. We were able to bisect the problem down to commit a8493c1, and we see that reverting that commit causes `dwarf_getscopes` to succeed even with elfutils 0.187 That commit is: libdw: Skip imported compiler_units in libdw_visit_scopes walking DIE tree Some gcc -flto versions imported other top-level compile units, skip those. Otherwise we'll visit various DIE trees multiple times. Note in the testcase that with newer GCC versions function foo is fully inlined and does appear only once (as declared, but not as separate subprogram). Signed-off-by: Mark Wielaard Any idea why this might have broken PC resolution for us? -- You are receiving this mail because: You are on the CC list for the bug.
[Bug libdw/29430] `dwarf_getscopes` fails after a8493c1
https://sourceware.org/bugzilla/show_bug.cgi?id=29430 Matt Wozniski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |NOTABUG --- Comment #1 from Matt Wozniski --- Well - nevermind. Our problem turned out not to be an issue with `dwarf_getscopes` at all, but a bug in our unwinder that only occurred when `dwarf_getscopes` finds 0 scopes. Our buggy code was working with elfutils 0.179 because `dwarf_getscopes` would erroneously return extra scopes due to DIE trees being visited multiple times, and we'd ignore those scopes because `dwarf_tag(scope) != DW_TAG_inlined_subroutine`, but our bug that triggers only when 0 scopes are found wouldn't occur. After `dwarf_getscopes` was fixed, it began returning 0 when it previously hadn't, and our code failed to properly handle that case in a way that had never been noticed. Sorry for the false alarm! -- You are receiving this mail because: You are on the CC list for the bug.
[Bug libdw/29434] New: Memory leak in `dwarf_getscopes`
https://sourceware.org/bugzilla/show_bug.cgi?id=29434 Bug ID: 29434 Summary: Memory leak in `dwarf_getscopes` Product: elfutils Version: unspecified Status: UNCONFIRMED Severity: normal Priority: P2 Component: libdw Assignee: unassigned at sourceware dot org Reporter: godlygeek at gmail dot com CC: elfutils-devel at sourceware dot org Target Milestone: --- Found by valgrind: ==173857== 64 bytes in 2 blocks are definitely lost in loss record 3,155 of 8,232 ==173857==at 0x480B7BB: malloc (vg_replace_malloc.c:380) ==173857==by 0x90143DC: pc_record (in /path/to/python_extension_module.cpython-38-x86_64-linux-gnu.so) ==173857==by 0x9019ABC: walk_children (in /path/to/python_extension_module.cpython-38-x86_64-linux-gnu.so) ==173857==by 0x901974A: __libdw_visit_scopes (in /path/to/python_extension_module.cpython-38-x86_64-linux-gnu.so) ==173857==by 0x9019A69: walk_children (in /path/to/python_extension_module.cpython-38-x86_64-linux-gnu.so) ==173857==by 0x901974A: __libdw_visit_scopes (in /path/to/python_extension_module.cpython-38-x86_64-linux-gnu.so) ==173857==by 0x9014691: dwarf_getscopes (in /path/to/python_extension_module.cpython-38-x86_64-linux-gnu.so) `dwarf_getscopes` ends with: ``` if (result > 0) *scopes = a.scopes; return result; ``` but this is incorrect, since `a.scopes` may be non-NULL even if `result` is <= 0 and is leaked in this case since no reference is retained to it. Seems like this needs to be: ``` if (result > 0) *scopes = a.scopes; else free(a.scopes); return result; ``` -- You are receiving this mail because: You are on the CC list for the bug.