The current version of makedumpfile in Noble is insufficient to generate
a readable dumpfile on kernels newer than 6.8. The two following commits
are needed

[1] 
https://github.com/makedumpfile/makedumpfile/commit/985e575253f1c2de8d6876cfe685c68a24ee06e1
[2] 
https://github.com/makedumpfile/makedumpfile/commit/bad2a7c4fa75d37a41578441468584963028bdda

** Patch added: "noble_makedumpfile.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/makedumpfile/+bug/2125145/+attachment/5917828/+files/noble_makedumpfile.debdiff

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to crash in Ubuntu.
https://bugs.launchpad.net/bugs/2125145

Title:
  [WIP] [SRU] Makedumpfile: Errors and Page Exclusions When Opening
  Kernel Crashdump Files Generated on the Latest HWE Kernel

Status in crash package in Ubuntu:
  Confirmed
Status in makedumpfile package in Ubuntu:
  Fix Released
Status in crash source package in Noble:
  New
Status in makedumpfile source package in Noble:
  New
Status in crash source package in Plucky:
  New
Status in makedumpfile source package in Plucky:
  Fix Released
Status in crash source package in Questing:
  New
Status in makedumpfile source package in Questing:
  Fix Released
Status in crash source package in Resolute:
  Confirmed
Status in makedumpfile source package in Resolute:
  Fix Released

Bug description:
  Note: Original description is at the bottom of this report

  [Impact]

  The current versions of Makedumpfile and Crash in the -updates pocket
  on Noble do not support the latest hardware enablement kernel for that
  platform, which is 6.14. There are several architecture-dependent and
  kernel flavor-dependent behaviours that I will outline below, but the
  steps to reproduce are the same.

  Reproducer steps:
  -----------------

  Boot into a hardware enablement kernel. For example, on arm64 use the
  6.14.0-1008-nvidia-64k kernel:

  KERNEL_VERSION=6.14.0-1008-nvidia-64k
  DISTRO=noble

  sudo apt update
  sudo apt install ubuntu-dbgsym-keyring
  echo "deb http://ddebs.ubuntu.com ${DISTRO} main restricted universe 
multiverse
  deb http://ddebs.ubuntu.com ${DISTRO}-updates main restricted universe 
multiverse | \
    sudo tee /etc/apt/sources.list.d/ddebs.list
  sudo apt update
  sudo apt install linux-image-${KERNEL_VERSION}
  sudo apt install linux-image-unsigned-${KERNEL_VERSION}-dbgsym

  Modify grub's cmdline to specify a crashkernel: 
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash crashkernel=512M" # Or similar
  sudo update-grub
  sudo apt install kexec-tools kdump-tools crash makedumpfile
  sudo systemctl enable kdump-tools
  sudo systemctl start kdump-tools
  sudo reboot

  echo c | sudo tee /proc/sysrq-trigger

  After the machine recovers,

  crash /usr/lib/debug/boot/<kernel-dbgsym> /var/crash/<dump-dir>/<dump-
  file>

  Results on Arm64
  ----------------

  crash 8.0.4
  Copyright (C) 2002-2022  Red Hat, Inc.
  ...
  For help, type "help".
  Type "apropos word" to search for commands related to "word"...

  please wait... (gathering task table data)
  crash: page excluded: kernel virtual address: ffff07ffa042d8e0  type: 
"xa_node.slots[off]"

  Results on amd64
  ----------------

  On an amd64 machine, using a kernel such as linux-
  image-6.14.0-29-generic results in crash failing to open. No error is
  printed but we don't obtain the prompt:

  crash 8.0.4
  ...
  For help, type "help".
  Type "apropos word" to search for commands related to "word"...

  # Program exits and no prompt is presented

  [Test Plan]

  * Ensure that with the proposed combination of makedumpfile and crash is 
capable of generating and subsequently opening crashdumps on the HWE and GA 
kernels available for that platform. Here is the mapping ATOW:
  Noble GA: 6.8
  Noble HWE: 6.14
  Plucky (interim release, no HWE): 6.14
  Questing (interim release, no HWE): 6.17
  Resolute (development): 6.17 (as of Oct. 14th 2025)

  * Ensure all of crash's commands produce the expected output (eg. ps,
  mount, files, vm, vtop, runq, etc.)

  * If bugs are found in generating and reading crashdumps on the HWE
  kernel on other architectures (s390x, etc.), this test plan can be
  expanded to include those.

  [Where Problems Could Occur]
  * Crash and Makedumpfile are designed to be backwards-compatible, so the risk 
of regression when backporting a commit is low - however, not zero. This is why 
it will be important to ensure that the proposed combination of Makedumpfile 
and crash does not break existing environments - eg. the GA kernel

  * The matrix of hardware and kernel versions (including derivative /
  cloud kernels) to test again is extensive. It's possible that the
  commits identified to solve the known problems will not be
  comprehensive. For example, cpu architectures and kernels not in the
  test matrix may require additional commits to be backported.

  [Other Info]

  * Support/SEG are currently having conversations with the kernel team
  about the potential to proactively SRU / MRE the latest upstream crash
  version, and potentially Makedumpfile as well, alongside -hwe kernel
  releases to avoid this sort of regression in the future. Though, we
  understand this would require an SRUExceptionPolicy to be approved and
  published.

  [Investigation and summary of changes]

  We have identified that on the Makedumpfile at least two commits are needed:
  [1] 
https://github.com/makedumpfile/makedumpfile/commit/985e575253f1c2de8d6876cfe685c68a24ee06e1
  [2] 
https://github.com/makedumpfile/makedumpfile/commit/bad2a7c4fa75d37a41578441468584963028bdda

  These are patches to compensate for a change in the kernel's mapping
  of memory. Using the patched Makedumpfile helps, but it is not
  sufficient. Including the patches in Makedumpfile (or using the tip of
  upstream master), but opening with the currently distributed crash
  results in the following errors:

  eg. Patched Makedumpfile with crash 8.0.4 on Arm64:
  ---------------------------------------------------
  ...
  WARNING: cannot determine starting stack frame for task ffffd574e21b4800

  WARNING: cannot determine starting stack frame for task
  ffff07ff83296300

  WARNING: cannot determine starting stack frame for task
  ffff07ff83293f80

  WARNING: cannot determine starting stack frame for task
  ffff07ff83a04700

  WARNING: cannot determine starting stack frame for task ffff08010507c400
        KERNEL: /usr/lib/debug/boot/vmlinux-6.14.0-1008-nvidia-64k
      DUMPFILE: /var/crash/patched_mdf/dump.202509191531  [PARTIAL DUMP]
          CPUS: 128 [OFFLINE: 127]
          DATE: Thu Jan  1 00:00:00 UTC 1970
        UPTIME: 00:13:38
  LOAD AVERAGE: 0.12, 0.16, 0.10
         TASKS: 1573
      NODENAME: penguru
       RELEASE: 6.14.0-1008-nvidia-64k
       VERSION: #8-Ubuntu SMP PREEMPT_DYNAMIC Sat Jul 26 02:43:53 UTC 2025
       MACHINE: aarch64  (unknown Mhz)
        MEMORY: 63.8 GB
         PANIC: "Kernel panic - not syncing: sysrq triggered crash"
           PID: 7886
       COMMAND: "tee"
          TASK: ffff08010507c400  [THREAD_INFO: ffff08010507c400]
           CPU: 85
         STATE: TASK_RUNNING (PANIC)

  On Amd64
  --------
  Crash still fails to open.

  Therefore, in addition to the above Makedumpfile commits, crash
  requires some patching. With the above two commits to Makedumpfile I
  did a bisect on crash on amd64 and arm64.

  On the amd64 crash side, I have identified that [3] applied in isolation 
(cherry-picked) is sufficient on amd64
  [3] 
https://github.com/crash-utility/crash/commit/6752571d8d782d07537a258a1ec8919ebd1308ad

  I have also found that cherry-picking [4] and [5] resolves the issue on arm64 
hardware in testflinger (using the machine agent penguru)
  [4] 
https://github.com/crash-utility/crash/commit/3879e9104826d5ae14a0824ec47ab60056a249a7
  [5] 
https://github.com/crash-utility/crash/commit/968debd0d5979dd9ddca3af0766bad714dbd51e3

  At this point, crash's commands such as mount, files, vm, etc. were
  still broken. To resolve this, [6] and [7] are needed

  [6] 
https://github.com/crash-utility/crash/commit/3d60d9d40457239683a5f20b01437db94f964fb8
  [7] 
https://github.com/crash-utility/crash/commit/2795136a515446b798ebbfa257c97f0ca6ecb8ec

  To SRU for Noble, crash must also be work on Plucky, Questing, and Resolute. 
The current version of makedumpfile on all of those series was found to be 
sufficient and so no SRU for makedumpfile is required on those. However for 
crash:
  * Plucky uses the 6.14 kernel, so no additional commits are needed - in fact 
due to the newer version available on Plucky, only [7] is needed.
  * Questing uses the 6.17 kernel. No issues other than [7] were observed on 
arm, but on amd64, an infinite loop while gdb loaded module symbols was 
observed, This is fixed in [8].
  * Resolute will ship with a newer kernel than 6.17, but as of October 14th, 
2025 is currently based on 6.17. Currently the package in Debian unstable, 
which will autosync to Resolute does not contain the required fixes and so it 
will also require SRU with [7] and [8] unless superceded by an upstream 
(Debian) version bump.

  [8] https://github.com/crash-
  utility/crash/commit/e44a9a9d808c83fb846060f65e5aaa9d30b6e2c4

  PPA with all of the packages built (except resolute):
  https://launchpad.net/~bryanfraschetti/+archive/ubuntu/lp2125145

  --------------------------------------------------------------

  Original Description:
  =====================

  24.04 LTS,
  Linux 6.14.0-29-generic #29~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Aug 14 
16:52:50 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

  Problem Description:
  crash utility is crashing (error code 1) when attempting to analyze kernel 
crash dumps.

  Setup kdump & generated kernel panic using “echo 1 >
  /proc/sys/kernel/sysrq” but, crash cannot access it:

  # crash /usr/lib/debug/boot/vmlinux-6.14.0-29-generic
  dump.202509161821

  crash 8.0.4
  Copyright (C) 2002-2022  Red Hat, Inc.
  Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
  Copyright (C) 1999-2006  Hewlett-Packard Co
  Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
  Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
  Copyright (C) 2005, 2011, 2020-2022  NEC Corporation
  Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
  Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
  Copyright (C) 2015, 2021  VMware, Inc.
  This program is free software, covered by the GNU General Public License,
  and you are welcome to change it and/or distribute copies of it under
  certain conditions.  Enter "help copying" to see the conditions.
  This program has absolutely no warranty.  Enter "help warranty" for details.

  GNU gdb (GDB) 10.2
  Copyright (C) 2021 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.
  Type "show copying" and "show warranty" for details.
  This GDB was configured as "x86_64-pc-linux-gnu".
  Type "show configuration" for configuration details.
  Find the GDB manual and other documentation resources online at:
      <http://www.gnu.org/software/gdb/documentation/>.

  For help, type "help".
  Type "apropos word" to search for commands related to "word"...

  # echo $?
  1

  running as root user and file is readable fine:

  $ :/var/crash/202509161821# ls -l
  total 299144
  -rw------- 1 root whoopsie    119627 Sep 16 18:21 dmesg.202509161821
  -rw-r--r-- 1 root whoopsie 306200163 Sep 16 18:21 dump.202509161821

  symbol file is there:

  # ls -l /usr/lib/debug/boot/vmlinux-6.14.0-29-generic*
  -rw-r--r-- 1 root root 450705920 Aug 14 18:02 
/usr/lib/debug/boot/vmlinux-6.14.0-29-generic

  tail of strace:

  14:06:20.661240 rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], 
sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 
<0.000008>
  14:06:20.661281 rt_sigaction(SIGINT, {sa_handler=0x5ec383cbceb0, sa_mask=[], 
sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 
<0.000008>
  14:06:20.661322 rt_sigaction(SIGSEGV, {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 
<0.000008>
  14:06:20.661360 write(1, "\n", 1
  )       = 1 <0.000119>
  14:06:20.661579 lseek(3, 10312, SEEK_SET) = 10312 <0.000010>
  14:06:20.661617 read(3, "OSRELEASE=6.14.0-29-generic\nBUIL"..., 3276) = 3276 
<0.000011>
  14:06:20.661748 unlink("/var/tmp/ramdump_elf_XXXXXX") = -1 ENOENT (No such 
file or directory) <0.002921>
  14:06:20.664817 exit_group(1)           = ?
  14:06:20.690105 +++ exited with 1 +++

  full crash strace https://filebin.net/custom-bin/crash.strace.1

  ProblemType: Bug
  DistroRelease: Ubuntu 24.04
  Package: crash 8.0.4-1ubuntu2
  ProcVersionSignature: Ubuntu 6.14.0-29.29~24.04.1-generic 6.14.8
  Uname: Linux 6.14.0-29-generic x86_64
  ApportVersion: 2.28.1-0ubuntu3.8
  Architecture: amd64
  CasperMD5CheckResult: pass
  Date: Thu Sep 18 20:21:26 2025
  InstallationDate: Installed on 2025-09-04 (14 days ago)
  InstallationMedia: Ubuntu 24.04.2 LTS "Noble Numbat" - Release amd64 
(20250215)
  ProcEnviron:
   LANG=en_US.UTF-8
   PATH=(custom, no user)
   SHELL=/bin/bash
   TERM=xterm-256color
  SourcePackage: crash
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/crash/+bug/2125145/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to