** Description changed:

- Placeholder
- To be improved
+ [Impact]
+ * Currently, users cannot perform multiple kernel kexec loads on AWS Nitro 
instances (KVM-based); after the 2nd or 3rd kexec, an initrd corruption is 
observed, with the following signature:
+ 
+  Initramfs unpacking failed: junk within compressed archive
+ [...]
+  Kernel panic - not syncing: No working init found.
+ Try passing init= option to kernel. See Linux 
Documentation/admin-guide/init.rst for guidance.
+ CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc7-gpiccoli+ #26  Hardware 
name: Amazon EC2 t3.large/, BIOS 1.0 10/16/2017
+ Call Trace:
+   dump_stack+0x6d/0x9a
+   ? csum_partial_copy_generic+0x150/0x170
+   panic+0x101/0x2e3
+   ? do_execve+0x25/0x30
+   ? rest_init+0xb0/0xb0
+   kernel_init+0xfb/0x100
+   ret_from_fork+0x35/0x40
+ 
+ * After investigation (see comment 2), it was noticed the Amazon ena
+ network driver doesn't provide a shutdown() handler, hence it could be
+ performing a DMA transaction to a previous valid address during boot,
+ which would then corrupt kernel memory. The following patch was proposed
+ and fixed the issue, allowing 1000 kexecs to be executed successfully
+ with no issues observed: 428c491332bc("net: ena: Add PCI shutdown
+ handler to allow safe kexec") [ git.kernel.org/linus/428c491332bc ].
+ 
+ * Hence, we are hereby requesting SRU for this patch. It was tested in
+ all supported series (4.4, 4.15 and 5.3) in Amazon Nitro instances with
+ success, and reviewed/acked by ena driver team and a kexec developer
+ from other distro. Worth mentioning that we proposed an upstream multi-
+ vendor discussion about this issue: marc.info/?l=kexec&m=158299605013194
+ 
+ [Test case]
+ 
+ * The basic test procedure is about performing multiple kexecs
+ sequentially; AWS does not provide a full console, so in case of
+ failures one could check the instance screenshot or use pstore/ramoops
+ in order to collect dmesg after a crash in a preserved memory area. The
+ commands used to perform kexec are:
+ 
+ kexec -l <kernel file> --initrd <initrd file> --reuse-cmdline
+ systemctl kexec
+ 
+ Alternatively, one could user "--append=" instead of "--reuse-cmdline"
+ if a change in kexec command-line is desired; also, to execute the
+ kexec-loaded kernel both "kexec -e" and "systemctl kexec" are equally
+ valid.
+ 
+ * On comment 3 we proposed a script/approach to auto-test kexecs, used
+ here to perform 1000 kexecs with the proposed patch.
+ 
+ [Regression Potential]
+ 
+ * Although the patch proposed here introduce a PCI handler, it kept the
+ remove handler identical and based shutdown strongly on ena_remove(),
+ changing just netdev handling following other upstream drivers. It was
+ extensively tested and presented no issue. Also, it's self-contained and
+ affect only one driver, so any other cloud providers or non-cloud
+ environment wouldn't be even affected by the patch.
+ 
+ * In case of a potential regression, it could manifest as a delay or
+ issue on reboot/shutdown path, only if ena driver is in use.

** Changed in: linux (Ubuntu Xenial)
       Status: Confirmed => In Progress

** Changed in: linux (Ubuntu Bionic)
       Status: Confirmed => In Progress

** Changed in: linux (Ubuntu Eoan)
       Status: Confirmed => In Progress

** Changed in: linux (Ubuntu Focal)
       Status: Confirmed => In Progress

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1869948

Title:
  Multiple Kexec in AWS Nitro instances fail

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1869948/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to