------- Comment From hbath...@in.ibm.com 2024-10-22 14:01 EDT------- Note that kfence fix brings down the memory requirement for fadump capture kernel but does not guarantee a success dump capture. So, fadump capture kernel hitting OOM in this case does not have to be considered a test fail. Also, note that the initial failure reported fails much earlier in the capture kernel boot unlike the kernel with the kfence fix patch, where it made forward progress and triggered an OOM killer eventually due to insufficient memory.
The memory requirement for fadump capture kernel is a moving target that depends on resources and services on the system. In summary, the kfence patch is still needed. Kowshik, please try increasing the memory reserved for fadump from 512M to 768M or 1024M and see if that results in a successful dump capture.. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2060039 Title: [Ubuntu-24.04] FADump with recommended crash size is making the L1 hang Status in The Ubuntu-power-systems project: Fix Committed Status in linux package in Ubuntu: Fix Released Status in linux source package in Noble: Fix Committed Status in linux source package in Oracular: Fix Released Bug description: SRU Justification: [Impact] * L1 host hangs when triggering FADump that results in crash [Fix] * 353d7a84c214f184d5a6b62acdec8b4424159b7c 353d7a84c214 "powerpc/64s/radix/kfence: map __kfence_pool at page granularity" [Test Case] * Have a Ubuntu Server 24.04 LTS installation on ppc64el. * Enable FADump with 1GB: fadump=on crashkernel=1024M * A kernel panic will happen when dump got triggered [Regression Potential] * There is a certain risk of a regression, but it is mapping only the memory allocated for KFENCE pool at page granularity, reducing memory consumption when KFENCE is used. * On top the commit is already upstream reviewed and accepted. * The modifications were done and tested by IBM. * The fadump feature is supported only on IBM POWER systems. [Other] * The fix/commit got upstream accepted with kernel v6.11-rc4, hence Oracular (with a planned kernel of 6.11) is not affected. ....................... Problem description : ====================== Triggered FADump with the recommended crash. L1 host got hung. As per the public document https://wiki.ubuntu.com/ppc64el/Recommendations recommended crash kernel size is 1024M for the system. But with 1024M and 2048M, the L1 is getting hanged. with 4096, crash is generated and collected. root@ubuntu2404:~# uname -ar Linux ubuntu2404 6.8.0-11-generic #11-Ubuntu SMP Wed Feb 14 00:33:03 UTC 2024 ppc64le ppc64le ppc64le GNU/Linux root@ubuntu2404:~# free -h total used free shared buff/cache available Mem: 48Gi 1.7Gi 46Gi 13Mi 687Mi 46Gi Swap: 8.0Gi 0B 8.0Gi root@ubuntu2404:~# cat /proc/cmdline BOOT_IMAGE=/vmlinux-6.8.0-11-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro fadump=on crashkernel=1024M root@ubuntu2404:~# dmesg | grep -i reser [ 0.000000] fadump: Reserved 1024MB of memory at 0x00000040000000 (System RAM: 51200MB) [ 0.000000] fadump: Initialized 0x40000000 bytes cma area at 1024MB from 0x40070000 bytes of memory reserved for firmware-assisted dump [ 0.000000] Memory: 49316672K/52428800K available (23616K kernel code, 4096K rwdata, 25536K rodata, 8832K init, 2487K bss, 2063552K reserved, 1048576K cma-reserved) [ 0.396408] ibmvscsi 30000066: Client reserve enabled root@ubuntu2404:~# kdump-config show DUMP_MODE: fadump USE_KDUMP: 1 KDUMP_COREDIR: /var/crash /var/lib/kdump/vmlinuz kdump initrd: /var/lib/kdump/initrd.img current state: ready to fadump IBM is looking to update the crash kernel reservations section of the wiki for Power. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/2060039/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp