Package: qemu-kvm
Version: 1.0+dfsg-9
Severity: normal

I cannot get optimal network througput on KVM guest using Debian Wheezy (and 
stable) as KVM host.
It is not horribly bad, just not good compared to relevant alternatives.

I have tried Ubuntu Server 11.10, Proxmox 1.9, Proxmox 2.0 and Fedora 17 Alpha 
as 1:1 replacement for the Debian KVM host using the same guests (just 
preserving the LVM volumes between installs) and they all manage about 20 
gbit/s guest to guest using a simple iperf test, while Debian only manages 
about 2.3 gbit/s with high CPU usage. The CPU usage is generally much higher on 
all guest network activity on Debian and are in some cases not able to even 
saturate a gigabit link (when coming from other subnet) without maxing a CPU 
core where the other KVM hosts barely breaks a sweat.

Disc IO is very good and the guests feels snappy so it doesn't seem like there 
is something really wrong, just something suboptimal with the networking.
The issue follows only the host OS as the guests have been the same in all 
comparisons (Debian Wheezy)


To reproduce:
------------
Install Debian Wheezy in guests (minimal with SSH and ntp)
Install iperf via apt-get
Configure network

Run test:
guest1: iperf -s
guest2: iperf -c <iperf-server> -i 2 -t 33333

My results:
----------------------------
- Guest to guest performance via local bridge: ~2.3 gbit/s, very high CPU usage 
on vhost-$PID and kvm process on host
- Physical server to guest on same subnet: ~940 mbit/s but with very high CPU 
usage on vhost-$PID and kvm process on host
- Physical server to guest via router: ~850 mbit/s with very high CPU usage on 
vhost-$PID and kvm process on host (why is routed traffic slower than switched 
on the guest??)
- Physical server to kvm host via router (just to verify that the router is not 
the issue): ~940 mbit/s with almost no CPU usage

Expected results after comparison with other KVM hosts everything else the same:
-------------------------
- Guest to guest performance via local bridge: ~20 gbit/s, high CPU usage
- Physical server to guest on same subnet: ~940 mbit/s with low CPU usage on 
vhost-$PID and a bit higher on kvm process on host
- Physical server to guest via router: ~940 mbit/s with low CPU usage on 
vhost-$PID and a bit higher on kvm process on host 
- Physical server to kvm host via router (just to verify that the router is not 
the issue): ~940 mbit/s with almost no CPU usage (the same as my current 
results)

Compare results with other OSes on same machine (guest to guest via bridge):
Ubuntu Server 11.10 (virtualization host): ~19 gbit
Proxmox VE 2.0: ~20 gbit/s
Fedora 17 alpha: ~20 gbit/s

VMWare ESXi 5 with VMXNET3: ~22 gbit/s


Host details:
---------------------
OS: Debian Wheezy (testing), kernel 3.2.0-2-amd64, currently based on 3.2.12

virsh qemu-monitor-command --hmp mail 'info version'
1.0.0 (Debian qemu-kvm 1.0+dfsg-9)

virsh qemu-monitor-command --hmp mail 'info kvm':
kvm support: enabled

lsmod | grep kvm:
kvm_intel             121968  9
kvm                   287572  1 kvm_intel

lsmod | grep vhost:
vhost_net              27436  3
tun                    18337  7 vhost_net
macvtap                17598  1 vhost_net

KSM enabled or disabled makes no difference on the results but here are my 
parameters with it on:
echo "1" > /sys/kernel/mm/ksm/run
echo "200" > /sys/kernel/mm/ksm/sleep_millisecs


Output from ps -ef of running guest:
/usr/bin/kvm -S -M pc-0.15 -cpu
core2duo,+lahf_lm,+rdtscp,+avx,+osxsave,+xsave,+aes,+popcnt,+x2apic,+sse4.2,+sse4.1,+pdcm,+xtpr,+cx16,+tm2,+est,+smx,+vmx,+ds_cpl,+dtes64,+pclmuldq,+pbe,

+tm,+ht,+ss,+acpi,+ds
-enable-kvm -m 512 -smp 1,sockets=1,cores=1,threads=1 -name mail -uuid
ccace357-783d-ce9f-444a-419445ee601d -nodefconfig -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/mail.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
-no-shutdown -drive
file=/dev/raid10/mail,if=none,id=drive-virtio-disk0,format=raw,cache=none
-device 
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
-drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw
-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1
-netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=23 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:f7:25:33,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -usb -device
usb-tablet,id=input0 -vnc 127.0.0.1:2 -vga cirrus -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5


Server hardware (seems to be the same issue regardless of server used):
- Intel(R) Xeon(R) CPU E31220 @ 3.10GHz Quad Core
- 16 GB ECC RAM
- Supermicro X9SCI-LN4F (Quad Intel Server NICs using e1000e)
- System disc: Corsair SSD Force Series 3 60GB
- Storage for guests: LVM images on directly attached RAID10


Guest details:
---------
OS: Debian Wheezy (testing), kernel 3.2.0-2-amd64, currently based on 3.2.12

root@mail:~# lsmod | grep virtio:
virtio_balloon         12832  0
virtio_blk             12874  3
virtio_net             17808  0
virtio_pci             13207  0
virtio_ring            12969  4 virtio_pci,virtio_net,virtio_blk,virtio_balloon
virtio                 13093  5 
virtio_ring,virtio_pci,virtio_net,virtio_blk,virtio_balloon


I have tried:
----------------
- Replacing Debian Wheezy with Debian Squeeze (stable, kernel 2.6.32-xx) - even 
worse results
- Replacing kernel 3.2.0-2-amd64 with vanilla kernel 3.4-rc2 and config based 
on Debians included config - no apparent change
- Extracted the kernel-config file from Fedora 17 alphas kernel and used this 
to compile a new kernel based on Debian Wheezys kernel source - slightly worse 

results
- Installing Proxmox VE 2.0 kernel in Debian. Results are the same
- ...in addition to exchanging Debian with Ubuntu Server 11.10, Fedora 17 
alpha, Proxmox 1.9 and 2.0 and ESXi 5 which all have expected network 
performance using virtio.


Please optimize KVM/vhost in Debian so it performs like the other alternatives.



-- Package-specific info:


/proc/cpuinfo:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 42
model name      : Intel(R) Xeon(R) CPU E31220 @ 3.10GHz
stepping        : 7
microcode       : 0x1b
cpu MHz         : 1600.000
cache size      : 8192 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm 
constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc 
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr 
pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm 
ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 6186.08
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 42
model name      : Intel(R) Xeon(R) CPU E31220 @ 3.10GHz
stepping        : 7
microcode       : 0x1b
cpu MHz         : 1600.000
cache size      : 8192 KB
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 4
apicid          : 2
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm 
constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc 
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr 
pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm 
ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 6185.89
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 42
model name      : Intel(R) Xeon(R) CPU E31220 @ 3.10GHz
stepping        : 7
microcode       : 0x1b
cpu MHz         : 1600.000
cache size      : 8192 KB
physical id     : 0
siblings        : 4
core id         : 2
cpu cores       : 4
apicid          : 4
initial apicid  : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm 
constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc 
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr 
pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm 
ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 6185.90
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 42
model name      : Intel(R) Xeon(R) CPU E31220 @ 3.10GHz
stepping        : 7
microcode       : 0x1b
cpu MHz         : 1600.000
cache size      : 8192 KB
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4
apicid          : 6
initial apicid  : 6
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm 
constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc 
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr 
pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm 
ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 6185.90
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:




-- System Information:
Debian Release: wheezy/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-2-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages qemu-kvm depends on:
ii  adduser           3.113+nmu1
ii  ipxe-qemu         1.0.0+git-20120202.f6840ba-3
ii  libaio1           0.3.109-2
ii  libasound2        1.0.25-2
ii  libbluetooth3     4.99-2
ii  libbrlapi0.5      4.3-2
ii  libc6             2.13-27
ii  libcurl3-gnutls   7.25.0-1
ii  libglib2.0-0      2.30.2-6
ii  libgnutls26       2.12.18-1
ii  libiscsi1         1.0.1-1
ii  libjpeg8          8d-1
ii  libncurses5       5.9-4
ii  libpng12-0        1.2.49-1
ii  libpulse0         1.1-3+b1
ii  librados2         0.43-1
ii  librbd1           0.43-1
ii  libsasl2-2        2.1.25.dfsg1-4
ii  libsdl1.2debian   1.2.15-2
ii  libspice-server1  0.10.1-2
ii  libtinfo5         5.9-4
ii  libuuid1          2.20.1-4
ii  libvdeplug2       2.3.2-4
ii  libx11-6          2:1.4.4-4
ii  python            2.7.2-10
ii  qemu-keymaps      1.0.1+dfsg-1
ii  qemu-utils        1.0.1+dfsg-1
ii  seabios           1.6.3-2
ii  vgabios           0.7a-2
ii  zlib1g            1:1.2.6.dfsg-2

Versions of packages qemu-kvm recommends:
ii  bridge-utils  1.5-2
ii  iproute       20120319-1

Versions of packages qemu-kvm suggests:
pn  debootstrap  <none>
pn  samba        <none>
pn  vde2         <none>

-- no debconf information



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to