** Description changed: - Placeholder - To be improved + [Impact] + * Currently, users cannot perform multiple kernel kexec loads on AWS Nitro instances (KVM-based); after the 2nd or 3rd kexec, an initrd corruption is observed, with the following signature: + + Initramfs unpacking failed: junk within compressed archive + [...] + Kernel panic - not syncing: No working init found. + Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance. + CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc7-gpiccoli+ #26 Hardware name: Amazon EC2 t3.large/, BIOS 1.0 10/16/2017 + Call Trace: + dump_stack+0x6d/0x9a + ? csum_partial_copy_generic+0x150/0x170 + panic+0x101/0x2e3 + ? do_execve+0x25/0x30 + ? rest_init+0xb0/0xb0 + kernel_init+0xfb/0x100 + ret_from_fork+0x35/0x40 + + * After investigation (see comment 2), it was noticed the Amazon ena + network driver doesn't provide a shutdown() handler, hence it could be + performing a DMA transaction to a previous valid address during boot, + which would then corrupt kernel memory. The following patch was proposed + and fixed the issue, allowing 1000 kexecs to be executed successfully + with no issues observed: 428c491332bc("net: ena: Add PCI shutdown + handler to allow safe kexec") [ git.kernel.org/linus/428c491332bc ]. + + * Hence, we are hereby requesting SRU for this patch. It was tested in + all supported series (4.4, 4.15 and 5.3) in Amazon Nitro instances with + success, and reviewed/acked by ena driver team and a kexec developer + from other distro. Worth mentioning that we proposed an upstream multi- + vendor discussion about this issue: marc.info/?l=kexec&m=158299605013194 + + [Test case] + + * The basic test procedure is about performing multiple kexecs + sequentially; AWS does not provide a full console, so in case of + failures one could check the instance screenshot or use pstore/ramoops + in order to collect dmesg after a crash in a preserved memory area. The + commands used to perform kexec are: + + kexec -l <kernel file> --initrd <initrd file> --reuse-cmdline + systemctl kexec + + Alternatively, one could user "--append=" instead of "--reuse-cmdline" + if a change in kexec command-line is desired; also, to execute the + kexec-loaded kernel both "kexec -e" and "systemctl kexec" are equally + valid. + + * On comment 3 we proposed a script/approach to auto-test kexecs, used + here to perform 1000 kexecs with the proposed patch. + + [Regression Potential] + + * Although the patch proposed here introduce a PCI handler, it kept the + remove handler identical and based shutdown strongly on ena_remove(), + changing just netdev handling following other upstream drivers. It was + extensively tested and presented no issue. Also, it's self-contained and + affect only one driver, so any other cloud providers or non-cloud + environment wouldn't be even affected by the patch. + + * In case of a potential regression, it could manifest as a delay or + issue on reboot/shutdown path, only if ena driver is in use.
** Changed in: linux (Ubuntu Xenial) Status: Confirmed => In Progress ** Changed in: linux (Ubuntu Bionic) Status: Confirmed => In Progress ** Changed in: linux (Ubuntu Eoan) Status: Confirmed => In Progress ** Changed in: linux (Ubuntu Focal) Status: Confirmed => In Progress -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1869948 Title: Multiple Kexec in AWS Nitro instances fail To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1869948/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs