[Bug libdw/32713] New: elfutils fails to symbolize core dumps created by Linux 6.12+

2025-02-18 Thread michael+sourceware at stapelberg dot ch
https://sourceware.org/bugzilla/show_bug.cgi?id=32713

Bug ID: 32713
   Summary: elfutils fails to symbolize core dumps created by
Linux 6.12+
   Product: elfutils
   Version: unspecified
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: libdw
  Assignee: unassigned at sourceware dot org
  Reporter: michael+sourceware at stapelberg dot ch
CC: elfutils-devel at sourceware dot org
  Target Milestone: ---

Hey folks, thanks for maintaining elfutils!

I noticed that on my Linux machine, core dumps collected by systemd-coredump(8)
do no longer show symbol names in the backtrace displayed by 'coredumpctl
info'.

Digging deeper, the issue is unrelated to systemd-coredump and can be
reproduced by calling eu-stack directly, too.

I used the following example program:

% echo 'int main() { char *no = 0; *no = 0x23; }' > segfault.c
% gcc -g -o segfault segfault.c -Wall -static

You can find the compiled output at
https://t.zekjur.net/2025-02-17-elfutils/segfault (just in case).

When I let this program crash on Linux 6.13, I get a core dump like
https://t.zekjur.net/2025-02-17-elfutils/core.segfault.1000.6158dd3b52af4b8384c103a8a336fc02.2913783.173980684300.zst

When I let this program crash on Linux 6.1, I get a core dump like
https://t.zekjur.net/2025-02-17-elfutils/core.segfault.0.8f168ad538ed480eab20ebbab491d953.1079959.173980544400.zst

(You can easily tell the two apart by the .0. vs. .1000. uid in the front of
the filename.)

Now, when I check with elfutils 0.190, elfutils 0.192 or
elfutils-0.192-42gb16f441c from git (current git HEAD), the coredump from Linux
6.1 can be symbolized; the one from Linux 6.13 cannot:

% eu-stack --version
eu-stack (elfutils) 0.190
Copyright (C) 2023 The elfutils developers <http://elfutils.org/>.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

% eu-stack -e ./segfault --core
./core.segfault.0.8f168ad538ed480eab20ebbab491d953.1079959.173980544400 
PID 1079959 - core
TID 1079959:
#0  0x00402e15 main
#1  0x00403398 __libc_start_call_main
#2  0x004056b0 __libc_start_main_impl
#3  0x00402d05 _start

% eu-stack -e ./segfault --core
./core.segfault.1000.6158dd3b52af4b8384c103a8a336fc02.2913783.173980684300
PID 2913783 - core
TID 2913783:
#0  0x00402e15
#1  0x00403398
#2  0x004056b0
#3  0x00402d05
eu-stack: dwfl_thread_getframes tid 2913783 at 0x402d04 in : No DWARF
information found

When I compare the two core dumps, I notice:

% eu-readelf -a
./core.segfault.1000.6158dd3b52af4b8384c103a8a336fc02.2913783.173980684300
> core.no_syms.readelf.txt

% eu-readelf -a
./core.segfault.0.8f168ad538ed480eab20ebbab491d953.1079959.173980544400 >
core.has_syms.readelf.txt  

% diff -u core.no_syms.readelf.txt core.has_syms.readelf.txt
--- core.no_syms.readelf.txt2025-02-17 16:12:45.641427118 +0100
+++ core.has_syms.readelf.txt   2025-02-17 16:12:51.624711599 +0100
[…]
   CORE 206  FILE
 5 files:
+  0040-00401000  4096/tmp/segfault
   00401000-0047c000 1000 503808  /tmp/segfault
   0047c000-004a2000 0007c000 155648  /tmp/segfault
-  0040-00401000  4096/tmp/segfault
-  004a7000-004a9000 000a6000 8192/tmp/segfault
   004a2000-004a7000 000a1000 20480   /tmp/segfault
+  004a7000-004a9000 000a6000 8192/tmp/segfault
[…]

I then tried different Linux kernel versions until I found that Linux 6.12 is
the first kernel where things break.

My suspicion is that commit
https://github.com/torvalds/linux/commit/7d442a33bfe817ab2a735f3d2e430e36305354ea
is responsible for the breakage.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[Bug libdw/32713] elfutils fails to symbolize core dumps created by Linux 6.12+

2025-02-19 Thread michael+sourceware at stapelberg dot ch
https://sourceware.org/bugzilla/show_bug.cgi?id=32713

--- Comment #3 from Michael Stapelberg  ---
(In reply to Mark Wielaard from comment #2)
> See also this kernel thread:
> https://lore.kernel.org/all/39fc2866-dff3-43c9-9d40-e8ff30a21...@juniper.net/
> Looks like the kernel people believe this in "in spec" so doesn't really
> break user space handling. So we'll have to figure out to work around it
> somehow.

OTOH, this message is in support of a revert:
https://lore.kernel.org/all/a3owf3zywbnntq4h4eytraeb6x7f77lpajszzmsy5d7zumg3tk@utzxmomx6iri/
— so maybe we’ll see a revert after all.

Thanks for having a look, and let me know if you want any additional details.

-- 
You are receiving this mail because:
You are on the CC list for the bug.