Public bug reported:

I am running a single nova-network node as gateway, and have about 20
KVM instances spreaded over 4 compute nodes (one of them is also
controller node), and everything is Ubuntu 12.04 LTS.

>From time to time one or another instance WILL loose connectivity, that
is, it still has its IP address (dhcp lease times raised up to 7 days)
but still, no communication back nor forth is possible.

This pretty much looks like some kind of networking problem, but what
exactly stopped working?

I connected to the failing KVM instance via VNC, and checked its
interface, which looks pretty normal (like the others, working ones)

On the hypervisor, I am having the following state:
root@colossus09:~# ifconfig
br100     Link encap:Ethernet  HWaddr 00:25:90:49:d9:04  
          inet6 addr: fe80::2c48:74ff:fe22:a6cb/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:85845 errors:0 dropped:15 overruns:0 frame:0
          TX packets:7463 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:2906526 (2.9 MB)  TX bytes:641770 (641.7 KB)

eth0      Link encap:Ethernet  HWaddr 00:25:90:49:d9:04  
          inet addr:10.10.30.189  Bcast:10.10.31.255  Mask:255.255.224.0
          inet6 addr: fe80::225:90ff:fe49:d904/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1359563761 errors:0 dropped:174156 overruns:2 frame:0
          TX packets:1222020947 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1111996716949 (1.1 TB)  TX bytes:673176161112 (673.1 GB)
          Memory:fafe0000-fb000000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:4628041 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4628041 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1060925632 (1.0 GB)  TX bytes:1060925632 (1.0 GB)

vlan100   Link encap:Ethernet  HWaddr 00:25:90:49:d9:04  
          inet6 addr: fe80::225:90ff:fe49:d904/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:909059394 errors:0 dropped:0 overruns:0 frame:0
          TX packets:907044613 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1053993706102 (1.0 TB)  TX bytes:641297608033 (641.2 GB)

vnet0     Link encap:Ethernet  HWaddr fe:16:3e:3e:f4:58  
          inet6 addr: fe80::fc16:3eff:fe3e:f458/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:62963968 errors:0 dropped:0 overruns:0 frame:0
          TX packets:61960786 errors:0 dropped:0 overruns:1 carrier:0
          collisions:0 txqueuelen:500 
          RX bytes:52542425624 (52.5 GB)  TX bytes:84912733569 (84.9 GB)

vnet1     Link encap:Ethernet  HWaddr fe:16:3e:01:ec:81  
          inet6 addr: fe80::fc16:3eff:fe01:ec81/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1280 errors:0 dropped:0 overruns:0 frame:0
          TX packets:56964 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500 
          RX bytes:110032 (110.0 KB)  TX bytes:2461222 (2.4 MB)

vnet2     Link encap:Ethernet  HWaddr fe:16:3e:3c:46:1b  
          inet6 addr: fe80::fc16:3eff:fe3c:461b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:34725792 errors:0 dropped:0 overruns:0 frame:0
          TX packets:35909449 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500 
          RX bytes:2321718823 (2.3 GB)  TX bytes:10039460160 (10.0 GB)

vnet0 is almost definitely the device to the failing KVM.

root@colossus09:~# ps afx | grep /kvm
 5080 pts/12   S+     0:00                      \_ grep --color=auto /kvm
 1811 ?        Sl   848:32 /usr/bin/kvm -S -M pc-1.0 -enable-kvm -m 16384 -smp 
8,sockets=8,cores=1,threads=1 -name instance-00000036 -uuid 
6dee1800-6e1e-42dd-abe9-8d8efa752bc5 -nodefconfig -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-00000036.monitor,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-drive 
file=/var/lib/nova/instances/instance-00000036/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none
 -device 
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -drive 
file=/var/lib/nova/instances/instance-00000036/disk.local,if=none,id=drive-virtio-disk1,format=qcow2,cache=none
 -device 
virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 
-netdev tap,fd=21,id=hostnet0 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:3c:46:1b,bus=pci.0,addr=0x3 
-chardev 
file,id=charserial0,path=/var/lib/nova/instances/instance-00000036/console.log 
-device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 
-device isa-serial,chardev=charserial1,id=serial1 -usb -device 
usb-tablet,id=input0 -vnc 0.0.0.0:2 -k en-us -vga cirrus -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
 2275 ?        Sl   4235:21 /usr/bin/kvm -S -M pc-1.0 -enable-kvm -m 32768 -smp 
20,sockets=20,cores=1,threads=1 -name instance-00000011 -uuid 
48e3db02-a8ec-4140-8faa-d1f1f101ef29 -nodefconfig -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-00000011.monitor,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-drive 
file=/var/lib/nova/instances/instance-00000011/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none
 -device 
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -netdev tap,fd=17,id=hostnet0 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:3e:f4:58,bus=pci.0,addr=0x3 
-chardev 
file,id=charserial0,path=/var/lib/nova/instances/instance-00000011/console.log 
-device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 
-device isa-serial,chardev=charserial1,id=serial1 -usb -device 
usb-tablet,id=input0 -vnc 0.0.0.0:0 -k en-us -vga cirrus -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
 2667 ?        Sl    28:37 /usr/bin/kvm -S -M pc-1.0 -enable-kvm -m 512 -smp 
1,sockets=1,cores=1,threads=1 -name instance-0000000f -uuid 
cb9aed4b-5daa-4c1c-85a6-9101adddde8d -nodefconfig -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-0000000f.monitor,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-drive 
file=/var/lib/nova/instances/instance-0000000f/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none
 -device 
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -netdev tap,fd=16,id=hostnet0 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:01:ec:81,bus=pci.0,addr=0x3 
-chardev 
file,id=charserial0,path=/var/lib/nova/instances/instance-0000000f/console.log 
-device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 
-device isa-serial,chardev=charserial1,id=serial1 -usb -device 
usb-tablet,id=input0 -vnc 0.0.0.0:1 -k en-us -vga cirrus -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

root@colossus09:~# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.10.30.4      0.0.0.0         UG    100    0        0 eth0
10.10.0.0       0.0.0.0         255.255.224.0   U     0      0        0 eth0

root@colossus09:~# brctl show
bridge name     bridge id               STP enabled     interfaces
br100           8000.00259049d904       no              vlan100
                                                        vnet0
                                                        vnet1
                                                        vnet2

root@colossus09:~# dmesg | grep vnet0 | tail -n 5
[827452.395730] br100: port 2(vnet0) entering disabled state
[827468.595961] device vnet0 entered promiscuous mode
[827468.661699] br100: port 2(vnet0) entering forwarding state
[827468.661705] br100: port 2(vnet0) entering forwarding state
[827479.315601] vnet0: no IPv6 routers present


root@colossus09:~# brctl showmacs br100
port no mac addr                is local?       ageing timer
  1     00:25:90:2b:63:de       no                22.36
  1     00:25:90:49:bf:ce       no                21.91
  1     00:25:90:49:bf:e2       no                22.56
  1     00:25:90:49:d9:04       yes                0.00
  3     fa:16:3e:01:ec:81       no               107.35
  1     fa:16:3e:14:b4:16       no                21.90
  1     fa:16:3e:28:e0:ab       no                21.74
  1     fa:16:3e:2b:be:38       no                21.65
  1     fa:16:3e:31:92:53       no                21.78
  1     fa:16:3e:3b:74:7a       no                21.92
  4     fa:16:3e:3c:46:1b       no                 0.00
  1     fa:16:3e:3d:ff:f3       no                 0.00
  2     fa:16:3e:3e:f4:58       no                 0.77
  1     fa:16:3e:42:8f:59       no                22.36
  1     fa:16:3e:43:bb:04       no                21.92
  1     fa:16:3e:47:65:c8       no                21.99
  1     fa:16:3e:4f:e6:4c       no                21.78
  1     fa:16:3e:57:a7:e6       no                22.50
  1     fa:16:3e:59:8d:93       no                 0.50
  1     fa:16:3e:64:fc:b5       no                21.71
  1     fa:16:3e:67:dc:73       no                22.27
  1     fa:16:3e:72:7f:3d       no                21.80
  1     fa:16:3e:7f:8c:5c       no                22.03
  3     fe:16:3e:01:ec:81       yes                0.00
  4     fe:16:3e:3c:46:1b       yes                0.00
  2     fe:16:3e:3e:f4:58       yes                0.00

I have then added an IP to br100, so I can directly test via PING.


root@colossus09:~# ip addr add 10.10.40.90/21 dev br100
root@colossus09:~# ip route flush table cache


root@colossus09:~# ping -c1 10.10.40.17
PING 10.10.40.17 (10.10.40.17) 56(84) bytes of data.
64 bytes from 10.10.40.17: icmp_req=1 ttl=64 time=0.533 ms

--- 10.10.40.17 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.533/0.533/0.533/0.000 ms
root@colossus09:~# ping -c1 10.10.40.9
PING 10.10.40.9 (10.10.40.9) 56(84) bytes of data.

--- 10.10.40.9 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

The first ping works well, that is 10.10.40.17 (a KVM instance directly on this 
host, vnet1 or vnet2),
and then tested to ping the failing KVM instance with 10.10.40.9, which times 
out.


root@colossus09:~# tcpdump -c 4 -n -i vnet0
tcpdump: WARNING: vnet0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vnet0, link-type EN10MB (Ethernet), capture size 65535 bytes
10:44:19.514639 ARP, Request who-has 10.10.40.1 tell 10.10.40.9, length 28
10:44:20.514110 ARP, Request who-has 10.10.40.1 tell 10.10.40.9, length 28
10:44:21.902920 ARP, Request who-has 10.10.40.1 tell 10.10.40.9, length 28
10:44:22.902762 ARP, Request who-has 10.10.40.1 tell 10.10.40.9, length 28
4 packets captured
4 packets received by filter

The above shows, that (I think) 10.10.40.9 wants to know the MAC of 10.10.40.1, 
but no one seems to answer,
but II might misinterpret here. At least, someone is not answering.

I can see the same ARP requests via tcpdump when inside the KVM instance
(via VNC).

What can I do to *fix* this?

For me, this incident is major, since we just cannot add more production
instances until we have fixed this. :-(

Best regards,
Christian.

** Affects: nova
     Importance: Undecided
         Status: New

** Affects: ubuntu
     Importance: Undecided
         Status: New

** Also affects: ubuntu
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1016848

Title:
  KVM instance stops communicating after some time

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1016848/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to