Hi On Wed, Jan 8, 2020 at 5:57 AM V. <[email protected]> wrote: > > Hi List, > > For my VM setup I tend to use a lot of VM to VM single network links to do > routing, switching and bridging in VM's instead of the host. > Also stemming from a silly fetish to sometimes use some OpenBSD VMs as > firewall, but that is besides the point here. > I am using the standard, tested and true method of using a whole bunch of > bridges, having 2 vhost taps each. > This works and it's fast, but it is a nightmare to manage with all the > interfaces on the host. > > So, I looked a bit into how I can improve this, basically coming down to "How > to connect 2 VM's together in a really fast and easy way". > This however, is not as straightforward as I thought, without going the whole > route of OVS/Snabb/any other big feature bloated > software switch. > Cause really, all I want is to connect 2 VM's in a fast and easy way. > Shouldn't be that hard right? > > Anyways, I end up finding tests/vhost-user-bridge.c, which is very nicely > doing half of what I wanted. > After some doubling of the vhosts and eliminating udp, I came up with a Vhost > User Cross Cable. (patch in next post). > It just opens 2 vhost sockets instead of 1 and does the forwarding between > them. > A terrible hack and slash of vhost-user-bridge.c, probably now with bugs > causing the dead of many puppies and the end of humanity, > but it works! > > However... I now am left with some questions, which I hope some of you can > answer. > > 1. > I looked, googled, read and tried things, but it is likely that I am an > complete and utter moron and my google-fu has just been awful... > Very likely... But is there really no other way then I have found to just > link up 2 QEMU's in a fast non-bridge way? (No, not sockets.) > Not that OVS and the likes are not fine software, but do we really need the > whole DPDK to do this?
By "not sockets", you mean the data path should use shared memory? Then, I don't think there are other way. > > 2. > In the unlikely chance that I'm not an idiot, then I guess now we have a nice > simple cross cable. > However, I am still a complete vhost/virtio idiot who has no clue how it > works and just randomly brute-forced code into submission. > Maybe not entirely true, but I would still appreciate it very much if someone > with more knowledge into vhost to have a quick look at > how things are done in cc. > > Specifically this monstrosity in TX (speed_killer is a 1MB buffer and kills > any speed): > ret = iov_from_buf(sg, num, 0, speed_killer, > iov_to_buf(out_sg, out_num, 0, speed_killer, > MIN(iov_size(out_sg, out_num), sizeof > speed_killer) > ) > ); > > vs. the commented: > //ret = iov_copy(sg, num, out_sg, out_num, 0, > // MIN(iov_size(sg, num), iov_size(out_sg, out_num))); > > The first is obviously a quick fix to get things working, however, in my > meager understanding, should the 2nd one not work? > Maybe I'm messing up my vectors here, or I am messing up my understanding of > iov_copy, but shouldn't the 2nd form be the way to zero > copy? As you noted, the data must be copied from source to dest memory. iov_copy() doesn't actually do that, I don't think we have a iov function for that. > > 3. > Now if Cross Cable is actually a new and (after a code-rewrite of 10) a > viable way to connect 2 QEMU's together, could I actually > suggest a better way? > I am thinking of a '-netdev vhost-user-slave' option to connect (as client or > server) to a master QEMU running '-netdev vhost-user'. > This way there is no need for any external program at all, the master can > have it's memory unshared and everything would just work > and be fast. > Also the whole thing can fall back to normal virtio if memory is not shared > and it would even work in pure usermode without any > context switch. > > Building a patch for this idea I could maybe get around to, don't clearly > have an idea how much work this would be but I've done > crazier things. > But is this is something that someone might be able to whip up in an hour or > two? Someone who actually does have a clue about vhost > and virtio maybe? ;-) I believe https://wiki.qemu.org/Features/VirtioVhostUser is what you are after. It's still being discussed and non-trivial, but not very active lately afaik. > > 4. > Hacking together cc from bridge I noticed the use of container_of() to get > from vudev to state in the vu callbacks. > Would it be an idea to add a context pointer to the callbacks (possibly > gotten from VuDevIface)? > And I know. First post and I have the forwardness to even suggest an API > change! I know! > But it makes things a bit simpler to avoid globals and it makes sense to have > some context in a callback to know what's going on, > right? ;-) Well, the callbacks are called with the VuDev, so container_of() is quite fine since you can embed the device in your own structure. I don't see a compelling reason to change that. > 5. > Last one, promise. > I'm very much in the church of "less software == less bugs == less security > problems". > Running cc or a vhost-user-slave means QEMU has fast networking in usermode > without the need for anything else then AF_UNIX + shared > mem. > So might it be possible to weed out any modern fancy stuff like the Internet > Protocol, TCP, taps, bridges, ethernet and tokenring > from a kernel and run QEMU on that? > The idea here is a kernel with storage, a serial console, AF_UNIX and > vfio-pci, only running QEMU. > Would this be feasible? Or does QEMU need a kernel which at least has a grasp > of understanding of what AF_INET and ethernet is? > (Does a modern kernel even still support tokenring? No idea, Probably does.) Sounds like it is possible. > Finally, an example and some numbers. > > Compiling and starting the cross cable: > ./configure > make tests/vhost-user-cc > tests/vhost-user-cc -l /tmp/left.sock -r /tmp/right.sock > > (Note, the cross cable will quit if one of the vm's quits, but the VM's will > reconnect when cc starts again.) > > 2 VM's, host1 and host2, Linux guests, run like this: > > host1: > /qemu/bin/qemu-system-x86_64 \ > -accel kvm -nodefaults -k en-us -vnc none -machine q35 -cpu host -smp > 8,cores=8 -m 2G -vga std \ > -object memory-backend-file,id=memory,mem-path=/hugetlbfs,share=on,size=2G \ > -numa node,memdev=memory \ > -drive if=none,cache=none,format=raw,aio=native,file=/dev/lvm/host1,id=sda \ > -device virtio-scsi-pci,id=scsi0 -device scsi-hd,drive=sda,bus=scsi0.0 \ > -nic > tap,vhost=on,helper=/usr/libexec/qemu-bridge-helper,id=eth0,model=virtio-net-pci,mac=52:54:00:aa:aa:aa,br=br0 > \ > -chardev socket,id=left,path=/tmp/left.sock,reconnect=1 \ > -nic > vhost-user,chardev=left,id=eth1,model=virtio-net-pci,mac=52:54:00:bb:bb:bb > > host2: > /qemu/bin/qemu-system-x86_64 \ > -accel kvm -nodefaults -k en-us -vnc none -machine q35 -cpu host -smp > 8,cores=8 -m 2G -vga std \ > -object memory-backend-file,id=memory,mem-path=/hugetlbfs,share=on,size=2G \ > -numa node,memdev=memory \ > -drive if=none,cache=none,format=raw,aio=native,file=/dev/lvm/host2,id=sda \ > -device virtio-scsi-pci,id=scsi0 -device scsi-hd,drive=sda,bus=scsi0.0 \ > -nic > tap,vhost=on,helper=/usr/libexec/qemu-bridge-helper,id=eth0,model=virtio-net-pci,mac=52:54:00:cc:cc:cc,br=br0 > \ > -chardev socket,id=right,path=/tmp/right.sock,reconnect=1 \ > -nic > vhost-user,chardev=right,id=eth1,model=virtio-net-pci,mac=52:54:00:dd:dd:dd > > > First, speed via eth0 (bridged tap with vhost, host2 runs './iperf3 -s'): > root@host1:~/iperf-3.1.3/src# ./iperf3 -c 192.168.0.2 -i 1 -t 10 > ... > [ 4] 0.00-10.00 sec 10.7 GBytes 9.22 Gbits/sec > receiver > > Second, speed via eth1 (Vhost Cross Cable): > root@host1:~/iperf-3.1.3/src# ./iperf3 -c 192.168.1.2 -i 1 -t 10 > ... > [ 4] 0.00-10.00 sec 2.05 GBytes 1.76 Gbits/sec > receiver > > So, a factor of 6 slowdown against bridge. Not too bad, considering the bad > iovec mem-copying I do. > Lots of room for improvement though, but at least for me it's also 5 times > faster as socket. > And what performance do you get with -netdev socket ? -- Marc-André Lureau
