I've been running this for 3+ days now and cannot reproduce this specific issue. From the look of the error it appears to be a hardware related NMI issue, so perhaps we have some faulty H/W in this specific case.
When running these tests for several days now with and without the kernel parameters I have observed the following: 1. It can take >10-15 minutes for a reboot. 2. Our instances were being accidentally deleted by a jenkins job which could be a reason why some of our original assumptions that the VM had died on reboot were incorrect. 3. When rebooting almost immediately when ssh access becomes available reboot gets stuck with systemd issues: sudo reboot systemctl status reboot.target Failed to get properties: Connection timed out and the only way to reboot is using the following: sudo systemctl --force reboot This could also be a reason why the automated reboot testing got locked up and we mistakenly believed that reboots were failing due to H/W issues. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/1822175 Title: i3.metal flavour type fails to respond after a reboot Status in linux-aws package in Ubuntu: In Progress Bug description: Series: Cosmic Instance Size: I3.Metal Region: (Default) US-WEST-2 Kernel: linux-aws During SRU testing the i3.metal instance flavor type will sometimes fail to respond after the instance is rebooted. Usually this has been seen at least 2 or 3 times during at test cycle. While rebooting an I3.Metal instance on the AWS Cloud. I observed the following crash which resulting in tearing down the instance and starting over. The instance was only restarted ~4 times at the time of this failure. [[0;32m OK [0m] Reached target Shutdown. [[0;32m OK [0m] Reached target Final Step. Starting Reboot... Stopping LVM2 metadata daemon... [[0;32m OK [0m] Stopped LVM2 metadata daemon. [ 447.340575] INFO: rcu_sched self-detected stall on CPU [ 447.340577] INFO: rcu_sched self-detected stall on CPU [ 447.340580] INFO: rcu_sched self-detected stall on CPU [ 447.340587] INFO: rcu_sched self-detected stall on CPU [ 447.340590] INFO: rcu_sched self-detected stall on CPU [ 447.340592] INFO: rcu_sched self-detected stall on CPU [ 447.340595] Uhhuh. NMI received for unknown reason 21 on CPU 0. [ 447.340599] INFO: rcu_sched self-detected stall on CPU [ 447.340602] INFO: rcu_sched self-detected stall on CPU [ 447.340606] INFO: rcu_sched self-detected stall on CPU [ 447.340614] 53-...!: (43 GPs behind) idle=7ce/1/0 softirq=392/392 fqs=0 [ 447.340617] INFO: rcu_sched self-detected stall on CPU [ 447.340621] Do you have a strange power saving mode enabled? [ 447.340628] 1-...!: (1 ticks this GP) idle=79e/1/0 softirq=881/881 fqs=0 [ 447.340632] INFO: rcu_sched self-detected stall on CPU [ 447.340634] INFO: rcu_sched self-detected stall on CPU [ 447.340636] INFO: rcu_sched self-detected stall on CPU [ 447.340639] INFO: rcu_sched self-detected stall on CPU [ 447.340641] INFO: rcu_sched self-detected stall on CPU [ 447.340644] INFO: rcu_sched self-detected stall on CPU [ 447.340647] INFO: rcu_sched self-detected stall on CPU The full log can be seen in the attached file. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1822175/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp