[Bug 1869948] Re: Multiple Kexec in AWS Nitro instances fail

Guilherme G. Piccoli Wed, 01 Apr 2020 14:26:25 -0700

After debugging the problem, a potential workaround was found which
alleviates but doesn't fix the issue; the workaround is to use the
"retain_initrd" on kexec boots to prevent kernel from freeing the initrd
memory area. Also, it was observed that bigger initrds tend to show the
problem more consistently.


After using pstore/ramoops to collect logs (and ftrace) on failure and
observe the same issue in multiple kernel versions (including mainline)
and other distros, it was clear the reason was a memory corruption.
Since kexec is fast path on reboot, not going through the full BIOS
reset, it was conjectured that an adapter not properly shutdown on kexec
path could have its firmware throwing an invalid memory access in form
of DMA write to a previous valid address, effectively corrupting an
arbitrary region.

Then, it was noticed Amazon ena driver does not have a shutdown handler,
which is used on reboot/kexec to quiesce properly the devices (through
the call chain  device_shutdown() -> pci_device_shutdown() -> driver
.shutdown() handler, if any).

In case the device has no shutdown handler, PCI layer will clear its
master bit on PCI command register, disabling the adapter. But this
operation doesn't quiesce the device's firmware, and in the next boot,
when it gets activated (aka, its master bit gets set), it may perform a
buffered memory operation.

Tests on mainline kernel performing rmmod of ena driver before kexec
showed that the initrd corruption didn't happen anymore, due to rmmod
calling ena_remove(), which properly turned the adapter down before the
kexecs.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1869948

Title:
  Multiple Kexec in AWS Nitro instances fail

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1869948/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1869948] Re: Multiple Kexec in AWS Nitro instances fail

Reply via email to