I tried to reproduce the bug but couldn't. My Ubuntu 18.04 VM (in West US 2, the VM size is: Basic A3 (4 vcpus, 7 GiB memory)) is still running fine, after I rebooted the VM 100 times with the below /etc/rc.local script:
#!/bin/bash date >> /root/reboot.log NUM=`wc -l /root/reboot.log | cut -d' ' -f1` [ $NUM -le 100 ] && reboot In Kirk's log, it looks the VM failed to obtain an IP via DHCP (or somehow the VM's network went DOWN?), and there are a lot of lines of WARNING ExtHandler CGroup walinuxagent.service: Crossed the Memory Threshold. Current Value: 627945472 bytes, Threshold: 512 megabytes. I don't know what the line means. I don't see any issue in Linux kernel and drivers. I guess the issue may be in the waagent/cloud-init daemons, or somewhere outside the VM. Since I can not reproduce the issue, and it looks nobody can reproduce the issue now (it looks Kirk's VM works fine now, after the VM was Stopped and Started again), I can not further debug the issue. If somebody manages to repro the issue again, please check if you're still able to login the VM via Azure serial console; if yes, please check if the VM has an IP address by "ifconfig -a"; if the VM has no IP, we'll have to check the syslog (and/or the network manager's log?) to find out what goes wrong; if the VM has an IP but is unable to communicate to the outside world, something may be wrong with the Linux network device driver. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1822133 Title: Azure Instance never recovered during series of instance reboots. Status in linux-azure package in Ubuntu: In Progress Bug description: Description: During SRU Testing of various Azure Instances, there will be some cases where the instance will not respond following a system reboot. SRU Testing only restarts a giving instance once, after it preps all of the necessary files to-be-tested. Series: Disco Instance Size: Basic_A3 Region: (Default) US-WEST-2 Kernel Version: 4.18.0-1013-azure #13-Ubuntu SMP Thu Feb 28 22:54:16 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux I initiated a series of tests which rebooted Azure Cloud instances 50 times. During the 49th Reboot, an Instance failed to return from a reboot.. Upon grabbing the console output the following was seen scrolling endlessly. I have seen this failure in cases where the instance only restarted a handful of times >5 [ 84.247704]hyperv_fb: unable to send packet via vmbus [ 84.247704]hyperv_fb: unable to send packet via vmbus [ 84.247704]hyperv_fb: unable to send packet via vmbus [ 84.247704]hyperv_fb: unable to send packet via vmbus [ 84.247704]hyperv_fb: unable to send packet via vmbus [ 84.247704]hyperv_fb: unable to send packet via vmbus [ 84.247704]hyperv_fb: unable to send packet via vmbus [ 84.247704]hyperv_fb: unable to send packet via vmbus In another test attempt I saw the following failure: ERROR ExtHandler /proc/net/route contains no routes ERROR ExtHandler /proc/net/route contains no routes ERROR ExtHandler /proc/net/route contains no routes ERROR ExtHandler /proc/net/route contains no routes ERROR ExtHandler /proc/net/route contains no routes ERROR ExtHandler /proc/net/route contains no routes ERROR ExtHandler /proc/net/route contains no routes Both of these failures broke networking, Both of these failures were seen at least twice to three times, thus may explain why in some cases we never recover from an instance reboot. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1822133/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp