Hi Jo, Thanks for the debdiffs.
Lunar and Jammy (both: 1 patch) look good, just removed '0001' from .patch. Focal has 6 patches and required more attention/changes (which I adjusted), and patches look good (some notes below for documentation/other reviewers). The (updated) debdiffs built correctly on ppa:mfo/lp2024479 on supported architectures (amd64, arm64, armhf, ppc64el, s390x) and I'm confident on your testing to be performed during -proposed verification. So, uploaded to F/J/L. Thanks, Mauricio ... Focal: 1) all Origin: links had wrong commit IDs; fixed. 2) there are commits to fix 3 different things, IIUIC, all of which are needed for big arm64 systems: (patches 1-2) phys_offset on large memory systems; (patches 3-5) many memory regions in /proc/iomem; (patch 6) split / memory regions for crash kernel; 3) patch 1 is mostly a big code movement, which I followed, and the few small changes in existing functions are for the purpose described (and are additions, not changes). 4) patch 2 adds code and hooks it into existing code 5) patch 3 simplifies the function call path, which is OK, and does a more significant logic change, but it still should perform the same thing (parse /proc/iomem format, which didn't change), and has no fixup commits. 6) patch 4 adds helpers for patches 3 and 5. 7) btw, patches 3/4 order is swapped (3 deps on 4); fixed. 8) patch 5 uses the helpers to change some existing code from static to dynamic allocation. 9) patch 6 (added to J/L/M) too, similarly, for the number of crash kernel / usable memory regions. Fix-up commits: none strictly required. --- I checked for newer commits (after 1st) in the modified files: (cat lp2024479_focal-v2.debdiff | grep '^+--- a/' | cut -d/ -f2- | sort -u) $ git log --oneline f4ce0706d9574aecb7d4aa9af7208a1bc9b6afb4..origin/master -- \ kexec/arch/arm64/{kexec,crashdump}-arm64.{c,h} \ kexec/mem_regions.{c,h} \ util_lib/Makefile \ vmcore-dmesg/Makefile \ vmcore-dmesg/vmcore-dmesg.c Only 4 commits were applicable to these code changes, AFAICT. There are 3 cleanup/style/optional, no functional impact: - commit 545c811050a3 ("Cleanup: remove the read_elf_kcore()") - commit 14ad054e7baa ("Fix an error definition about the variable 'fname'") - commit a7c4cb8e9985 ("Cleanup: move it back from util_lib/elf_info.c") There is 1 which is more serious, but does not impact Ubuntu kernels, since do not use 52-bit (but 48-) virtual address space with M/L/J/F: - commit 67ea2d99e135 ("arm64: make phys_offset signed") (... phys_offset can be negative if running 52-bits kernel on 48-bits hardware ...) mantic: $ git grep -r ARM64_VA_BITS origin/master-next -- debian.master/config | grep -e "'y'" -e ARM64_VA_BITS_52 <...>:CONFIG_ARM64_VA_BITS_48 policy<{'arm64': 'y'}> <...>:CONFIG_ARM64_VA_BITS_52 policy<{'arm64-generic-64k': 'n'}> lunar: $ git grep -r ARM64_VA_BITS origin/master-next -- debian.master/config | grep -e "'y'" -e ARM64_VA_BITS_52 <...>:CONFIG_ARM64_VA_BITS_48 policy<{'arm64': 'y'}> <...>:CONFIG_ARM64_VA_BITS_52 policy<{'arm64-generic-64k': 'n'}> jammy: $ git grep -r ARM64_VA_BITS origin/master-next -- debian.master/config | grep -e "'y'" -e ARM64_VA_BITS_52 <...>:CONFIG_ARM64_VA_BITS_48 policy<{'arm64': 'y'}> <...>:CONFIG_ARM64_VA_BITS_52 policy<{'arm64-generic-64k': 'n', 'arm64-lowlatency-64k': 'n'}> focal: $ git grep -r ARM64_VA_BITS origin/master-next -- debian.master/config | grep -e "'y'" -e ARM64_VA_BITS_52 <...>:CONFIG_ARM64_VA_BITS_48 policy<{'arm64': 'y'}> -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to kexec-tools in Ubuntu. https://bugs.launchpad.net/bugs/2024479 Title: kdump fails on big arm64 systems when offset is not specified Status in kexec-tools package in Ubuntu: Fix Released Status in linux package in Ubuntu: Invalid Status in linux-hwe-5.15 package in Ubuntu: Invalid Status in kexec-tools source package in Focal: In Progress Status in linux source package in Focal: Won't Fix Status in linux-hwe-5.15 source package in Focal: In Progress Status in kexec-tools source package in Jammy: In Progress Status in linux source package in Jammy: In Progress Status in linux-hwe-5.15 source package in Jammy: Invalid Status in kexec-tools source package in Kinetic: Won't Fix Status in linux source package in Kinetic: Won't Fix Status in linux-hwe-5.15 source package in Kinetic: Invalid Status in kexec-tools source package in Lunar: In Progress Status in kexec-tools source package in Mantic: Fix Released Bug description: [Impact] kdump fails on arm64, on machines with a lot of memory when offset is not specified, e.g when /etc/default/grub.d/kdump-tools.cfg looks like: GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=4G" If kdump-tools.cfg specifies the offset e.g.: GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=4G@4G" it works ok. The reason for this is that the kernel needs to allocate memory for the crashkernel both in low and high memory. This is addressed in kernel 6.2. In addition kexec-tools needs to support more than one crash kernel region. [Fix] To address this issue the following upstream commits are needed: - From the kernel side: commit a9ae89df737756d92f0e14873339cf393f7f7eb0 Author: Zhen Lei <thunder.leiz...@huawei.com> Date: Wed Nov 16 20:10:44 2022 +0800 arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones commit a149cf00b158e1793a8dd89ca492379c366300d2 Author: Zhen Lei <thunder.leiz...@huawei.com> Date: Wed Nov 16 20:10:43 2022 +0800 arm64: kdump: Provide default size when crashkernel=Y,low is not specified - From kexec-tools: commit b5a34a20984c4ad27cc5054d9957af8130b42a50 Author: Chen Zhou <chenzho...@huawei.com> Date: Mon Jan 10 18:20:08 2022 +0800 arm64: support more than one crash kernel regions Affected releases: Jammy, Focal, Bionic For Bionic we won't fix it as we need to backport a lot of code and the regression potential is too high. The same applies for the Focal 5.4 kernel. Only the Focal 5.15 hwe kernel (from Jammy) will be fixed. [Test Plan] You need an arm64 machine (can be a VM too) with large memory e.g. 128G. Install linux-crashdump, configure the crash kernel size, and trigger a crash. - Failing scenario (crashkernel >= 4G, without offset "@<address>"): It won't work unless the offset is specified because the memory crashkernel cannot be allocated. With the patches applied it works as expected without having to specify the offset. - Working scenario (crashkernel < 4G, e.g., 'crashkernel=1G') This must continue to work with the new patches (ie, no regressions), including patched kexec-tools on unpatched kernel (eg, 5.4 kernel on Focal). [Regression Potential] KERNEL 5.15 - Jammy (and Focal via the HWE kernel): To address this problem in the 5.15 kernel we need to pull in 7 commits (see [Other] section for details. All the commits are changing code only for arm64 architecture and only the code related to reserving the crashkernel. This means that any regression potential will affect only the arm64 architecture and in particular the crash/kdump functionality. However, since the reservation of the crashkernel occurs at boot up, potentially things could go wrong there as well. kexec-tools - FOCAL: To fix the kexec_tools in focal we need to pull in 6 commits (see [Other section for details]). They all cherry pick. Four out of six commits touch only arm64 code. Any regression potential because of these commits would regard either crashdump or kexec functionality. Commit cf977b1af9ec67fab adds code without altering current functionality. Commit f4ce0706d9574aecb7 adds functionality to read elf notes. In practive it moves the code from vmcore-dmesg.c to elf_info.c so it can be used by other features. kexec-tools - JAMMY, LUNAR, MANTIC: Commit b5a34a20984c is pulled in, it cherry-picks. It changes only arm64 code. It enables kexec to recognise that the reserved kernel may use more than one kernel region. Things could go worng when gatherinng a crashdump. [Other] Commits to backport - MANTIC: kernel 6.3: not affected kexec-tools: b5a34a20984c4ad27cc5054d9957af8130b42a50 arm64: support more than one crash kernel regions - LUNAR: kernel 6.2: not affected kexec-tools: b5a34a20984c4ad27cc5054d9957af8130b42a50 arm64: support more than one crash kernel regions - KINETIC: WON'T FIX Kinetic won't be fixed as it is EOL soon. - JAMMY: kernel (5.15 kernel): a9ae89df737756d92f0e14873339cf393f7f7eb0 arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones a149cf00b158e1793a8dd89ca492379c366300d2 arm64: kdump: Provide default size when crashkernel=Y,low is not specified 4890cc18f94979b406f95708f8cb238eb2d0e5a9 arm64/mm: Define defer_reserve_crashkernel() 8f0f104e2ab6eed4cad3b111dc206f843bda43ea arm64: kdump: Do not allocate crash low memory if not needed 5832f1ae50600ac6b2b6d00cfef42d33a9473f06 docs: kdump: Update the crashkernel description for arm64 944a45abfabc171fd121315ff0d5e62b11cb5d6f arm64: kdump: Reimplement crashkernel=X d339f1584f0acf32b32326627fa3b48e6e65c599 arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef kexec-tools: b5a34a20984c4ad27cc5054d9957af8130b42a50 arm64: support more than one crash kernel regions - FOCAL: Kernel 5.4: Won't fix because of high regression potential. Kernel 5.15 (HWE): Fixed via Jammy. kexec-tools: b5a34a20984c4ad27cc5054d9957af8130b42a50 arm64: support more than one crash kernel regions 2572b8d702e452624bdb8d7b7c39f458e7dcf2ce arm64: kdump: deal with a lot of resource entries in /proc/iomem cf977b1af9ec67fabcc6a625589c49c52d07b11d kexec: add variant helper functions for handling memory regions f736104f533290b4ce6fbfbca74abde9ffd3888c arm64: kexec: allocate memory space avoiding reserved regions 64c49f27d88024eaab990d2cd6069289cf853098 arm64: Add support to read PHYS_OFFSET from 'kcore' - pt_note or pt_load (if available) f4ce0706d9574aecb7d4aa9af7208a1bc9b6afb4 util_lib: Add functionality to read elf notes To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/kexec-tools/+bug/2024479/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp