I tried to reproduce the bug but couldn't.

My Ubuntu 18.04 VM (in West US 2, the VM size is: Basic A3 (4 vcpus, 7
GiB memory)) is still running fine, after I rebooted the VM 100 times
with the below /etc/rc.local script:

#!/bin/bash
date >> /root/reboot.log
NUM=`wc -l /root/reboot.log | cut -d' ' -f1`
[ $NUM -le 100 ] && reboot

In Kirk's log, it looks the VM failed to obtain an IP via DHCP (or
somehow the VM's network went DOWN?), and there are a lot of lines of

WARNING ExtHandler CGroup walinuxagent.service: Crossed the Memory
Threshold. Current Value: 627945472 bytes, Threshold: 512 megabytes.

I don't know what the line means.

I don't see any issue in Linux kernel and drivers. I guess the issue may
be in the waagent/cloud-init daemons, or somewhere outside the VM.

Since I can not reproduce the issue, and it looks nobody can reproduce
the issue now (it looks Kirk's VM works fine now, after the VM was
Stopped and Started again), I can not further debug the issue.

If somebody manages to repro the issue again, please check if you're
still able to login the VM via Azure serial console; if yes, please
check if the VM has an IP address by "ifconfig -a"; if the VM has no IP,
we'll have to check the syslog (and/or the network manager's log?) to
find out what goes wrong; if the VM has an IP but is unable to
communicate to the outside world, something may be wrong with the Linux
network device driver.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1822133

Title:
  Azure Instance never recovered during series of instance reboots.

Status in linux-azure package in Ubuntu:
  In Progress

Bug description:
  Description: During SRU Testing of various Azure Instances, there will
  be some cases where the instance will not respond following a system
  reboot.  SRU Testing only restarts a giving instance once, after it
  preps all of the necessary files to-be-tested.

  Series: Disco
  Instance Size: Basic_A3
  Region: (Default) US-WEST-2
  Kernel Version: 4.18.0-1013-azure #13-Ubuntu SMP Thu Feb 28 22:54:16 UTC 2019 
x86_64 x86_64 x86_64 GNU/Linux

  I initiated a series of tests which rebooted Azure Cloud instances 50
  times. During the 49th Reboot, an Instance failed to return from a
  reboot.. Upon grabbing the console output the following was seen
  scrolling endlessly. I have seen this failure in cases where the
  instance only restarted a handful of times >5

  [    84.247704]hyperv_fb: unable to send packet via vmbus
  [    84.247704]hyperv_fb: unable to send packet via vmbus
  [    84.247704]hyperv_fb: unable to send packet via vmbus
  [    84.247704]hyperv_fb: unable to send packet via vmbus
  [    84.247704]hyperv_fb: unable to send packet via vmbus
  [    84.247704]hyperv_fb: unable to send packet via vmbus
  [    84.247704]hyperv_fb: unable to send packet via vmbus
  [    84.247704]hyperv_fb: unable to send packet via vmbus

  In another test attempt I saw the following failure:

  ERROR ExtHandler /proc/net/route contains no routes
  ERROR ExtHandler /proc/net/route contains no routes
  ERROR ExtHandler /proc/net/route contains no routes
  ERROR ExtHandler /proc/net/route contains no routes
  ERROR ExtHandler /proc/net/route contains no routes
  ERROR ExtHandler /proc/net/route contains no routes
  ERROR ExtHandler /proc/net/route contains no routes

  
  Both of these failures broke networking, Both of these failures were seen at 
least twice to three times, thus may explain why in some cases we never recover 
from an instance reboot.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1822133/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to