[Bug libdw/30272] New: Unwinding multithreaded musl applications fails

2023-03-24 Thread godlygeek at gmail dot com via Elfutils-devel
https://sourceware.org/bugzilla/show_bug.cgi?id=30272

Bug ID: 30272
   Summary: Unwinding multithreaded musl applications fails
   Product: elfutils
   Version: unspecified
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: libdw
  Assignee: unassigned at sourceware dot org
  Reporter: godlygeek at gmail dot com
CC: elfutils-devel at sourceware dot org
  Target Milestone: ---

Unwinding multithreaded applications linked against musl libc on x86-64 seems
to fail, getting stuck on `__clone`:

TID 241:
...
#20 0x7f6f2f74f08b start
#21 0x7f6f2f75138e __clone
#22 0x7f6f2f75138e __clone
#23 0x7f6f2f75138e __clone
...
#253 0x7f6f2f75138e __clone
#254 0x7f6f2f75138e __clone
#255 0x7f6f2f75138e __clone
eu-stack: tid 241: shown max number of frames (256, use -n 0 for unlimited)


GDB seems to detect the condition that libdw is getting stuck on, emitting a
warning message:

#44 0x7f8f83e4d08b in start (p=0x7f8f836b8b00) at
src/thread/pthread_create.c:203
#45 0x7f8f83e4f38e in __clone () at src/thread/x86_64/clone.s:22
Backtrace stopped: frame did not save the PC

I believe it's detecting that two frames in a row have the same DWARF CFA, if I
understand correctly.


Reproducer:

docker run -it --privileged python:3.10-alpine sh

And in the container:

apk add --update musl-dbg elfutils
python3.10 -c "import os, threading; threading.Thread(target=lambda:
os.system(f'eu-stack --pid={os.getpid()}')).start()"

That spawns a thread that forks a subprocess that runs `eu-stack` on its
parent, and reproduces the issue. If you remove the thread and just run:

python3.10 -c "import os; os.system(f'eu-stack --pid={os.getpid()}')"

then unwinding succeeds, ending at `_start`.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[Bug libdw/30272] Unwinding multithreaded musl applications fails

2023-04-02 Thread godlygeek at gmail dot com via Elfutils-devel
https://sourceware.org/bugzilla/show_bug.cgi?id=30272

--- Comment #1 from Matt Wozniski  ---
I encountered this issue using `dwfl_getthread_frames`, and I've found that
calling `dwfl_frame_reg` to check if the stack pointer register was the same
for two frames in a row and breaking out if so seems to work around it. I'm not
sure if that's entirely correct, though. Are there any legitimate cases where
two different frames passed to the callback would have the same stack pointer?
My impression is that the stack pointer should change for every function call
because the return address is stored on the stack, but perhaps there are some
architectures where that isn't the case...

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[Bug libdw/29430] New: `dwarf_getscopes` fails after a8493c1

2022-07-28 Thread godlygeek at gmail dot com via Elfutils-devel
https://sourceware.org/bugzilla/show_bug.cgi?id=29430

Bug ID: 29430
   Summary: `dwarf_getscopes` fails after a8493c1
   Product: elfutils
   Version: unspecified
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: libdw
  Assignee: unassigned at sourceware dot org
  Reporter: godlygeek at gmail dot com
CC: elfutils-devel at sourceware dot org
  Target Milestone: ---

Apologies, but I haven't yet succeeded in creating a self-contained reproducer
for this issue.

When calling `dwarf_getscopes` on a (PGO and LTO) binary (a Python interpreter
built with GCC 9.3.1 against glibc 2.12, which is a relatively old glibc
version), I'm seeing failures with elfutils 0.187 that I didn't see with
elfutils 0.179. We were able to bisect the problem down to commit a8493c1, and
we see that reverting that commit causes `dwarf_getscopes` to succeed even with
elfutils 0.187

That commit is:

libdw: Skip imported compiler_units in libdw_visit_scopes walking DIE tree

Some gcc -flto versions imported other top-level compile units,
skip those. Otherwise we'll visit various DIE trees multiple times.

Note in the testcase that with newer GCC versions function foo is
fully inlined and does appear only once (as declared, but not as
separate subprogram).

Signed-off-by: Mark Wielaard 

Any idea why this might have broken PC resolution for us?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[Bug libdw/29430] `dwarf_getscopes` fails after a8493c1

2022-07-29 Thread godlygeek at gmail dot com via Elfutils-devel
https://sourceware.org/bugzilla/show_bug.cgi?id=29430

Matt Wozniski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |NOTABUG

--- Comment #1 from Matt Wozniski  ---
Well - nevermind. Our problem turned out not to be an issue with
`dwarf_getscopes` at all, but a bug in our unwinder that only occurred when
`dwarf_getscopes` finds 0 scopes. Our buggy code was working with elfutils
0.179 because `dwarf_getscopes` would erroneously return extra scopes due to
DIE trees being visited multiple times, and we'd ignore those scopes because
`dwarf_tag(scope) != DW_TAG_inlined_subroutine`, but our bug that triggers only
when 0 scopes are found wouldn't occur.

After `dwarf_getscopes` was fixed, it began returning 0 when it previously
hadn't, and our code failed to properly handle that case in a way that had
never been noticed.

Sorry for the false alarm!

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[Bug libdw/29434] New: Memory leak in `dwarf_getscopes`

2022-08-01 Thread godlygeek at gmail dot com via Elfutils-devel
https://sourceware.org/bugzilla/show_bug.cgi?id=29434

Bug ID: 29434
   Summary: Memory leak in `dwarf_getscopes`
   Product: elfutils
   Version: unspecified
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: libdw
  Assignee: unassigned at sourceware dot org
  Reporter: godlygeek at gmail dot com
CC: elfutils-devel at sourceware dot org
  Target Milestone: ---

Found by valgrind:

==173857== 64 bytes in 2 blocks are definitely lost in loss record 3,155 of
8,232
==173857==at 0x480B7BB: malloc (vg_replace_malloc.c:380)
==173857==by 0x90143DC: pc_record (in
/path/to/python_extension_module.cpython-38-x86_64-linux-gnu.so)
==173857==by 0x9019ABC: walk_children (in
/path/to/python_extension_module.cpython-38-x86_64-linux-gnu.so)
==173857==by 0x901974A: __libdw_visit_scopes (in
/path/to/python_extension_module.cpython-38-x86_64-linux-gnu.so)
==173857==by 0x9019A69: walk_children (in
/path/to/python_extension_module.cpython-38-x86_64-linux-gnu.so)
==173857==by 0x901974A: __libdw_visit_scopes (in
/path/to/python_extension_module.cpython-38-x86_64-linux-gnu.so)
==173857==by 0x9014691: dwarf_getscopes (in
/path/to/python_extension_module.cpython-38-x86_64-linux-gnu.so)

`dwarf_getscopes` ends with:
```
  if (result > 0)
*scopes = a.scopes;

  return result;
```

but this is incorrect, since `a.scopes` may be non-NULL even if `result` is <=
0 and is leaked in this case since no reference is retained to it. Seems like
this needs to be:
```
  if (result > 0)
*scopes = a.scopes;
  else
free(a.scopes);

  return result;
```

-- 
You are receiving this mail because:
You are on the CC list for the bug.