------- Comment From michael.holz...@de.ibm.com 2016-03-31 12:50 EDT------- (In reply to comment #11) > I do not want to increase it beyond 128MB by default, as this amount of > memory is reserved by default on all installations big or small. And there > will always be newer kernel (that reserves more memory) or more devices that > require higher values.
Ok, if we go with the 128M default our current assumption is that on all LPARs kdump for Ubuntu will fail. I am not sure how many customers are aware of the fact that they have to manually tune the kdump settings. Therefore what about some documentation like the following: Depending on your system setup it can be necessary to increase the default setting of the "crashkernel=" kernel parameter. It is recommended to test the kdump setup with "echo c > /proc/sysrq-trigger". If the crashkernel value is not high enough, kdump will crash with out-of-memory error messages. You can see the message on the operating system messages. In case kdump does not work you have to increase the value in /etc/zipl.conf. Do you already have a similar documentation already available? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to makedumpfile in Ubuntu. https://bugs.launchpad.net/bugs/1564475 Title: 128M is not enough for kdump on s390 LPARs Status in makedumpfile package in Ubuntu: Invalid Bug description: == Comment: #0 - Michael Holzheu <michael.holz...@de.ibm.com> - 2016-03-31 10:59:26 == With the current Ubuntu default setting "crashkernel=128M" kdump on LPARs crashes with out-of-memory (see attachment "dmesg_lpar_out_of_mem_128M.txt"). On z/VM guests 128M seems to be sufficient. One reason on our test LPAR is that a lot of devices are attached (see attachment "lscss_lpar.txt") which are not required for kdump but consume a lot of memory because the s390 CIO layer allocates data structures in the kernel for those devices. We can disable the devices by using the "cio_ignore=" kernel parameter in "/etc/default/kdump-tools". For example, on our LPAR that uses DASD 0.0.e934 for /var/crash, we added the following line to disable the devices: KDUMP_CMDLINE_APPEND="irqpoll maxcpus=1 cio_ignore=all,!condev,!0.0.e934" For more information on the "cio_ignore=" kernel parameter see: https://github.com/torvalds/linux/blob/master/Documentation/s390/CommonIO Even with "cio_ignore=" we still get out-of-memory with "crashkernel=128M". With "crashkernel=196M" and "cio_ignore=" we are able to create a dump on our LPAR. We currently do not know why kdump with "cio_ignore=" on LPAR consumes more memory than on z/VM guests. == Comment: #1 - Michael Holzheu <michael.holz...@de.ibm.com> - 2016-03-31 11:03:15 == Kernel messages of kdump out-of-memory crash on LPAR with many devices without cio_ignore parameter and 128M crashkernel memory. == Comment: #2 - Michael Holzheu <michael.holz...@de.ibm.com> - 2016-03-31 11:04:10 == Output of lscss showing all attached (not online) devices on the LPAR. == Comment: #3 - Michael Holzheu <michael.holz...@de.ibm.com> - 2016-03-31 11:07:35 == To solve this issue our recommendation is: 1) Increase "crashkernel=" default to 196M on Ubuntu for s390. 2) Document that KDUMP_CMDLINE_APPEND with "cio_ignore=" can be used to decrease memory consumption for kdump on systems with many devices that are not required for kdump. The most user friendly solution would be to automatically determine the required kdump devices and set the correct "cio_ignore=" kernel parameter. But this is not trivial, because it can be difficult to find out the required devices for stacked setups like LVM or for network dump. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/makedumpfile/+bug/1564475/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp