I want to continue the discussion regarding using MUSER (https://github.com/nutanix/muser) as a device offloading mechanism. The main drawback of MUSER is that it requires a kernel module, so I've experimented with a proof of concept of how MUSER would look like if we somehow didn't need a kernel module. I did this by implementing a wrapper library (https://github.com/tmakatos/libpathtrap) that intercepts accesses to VFIO-related paths and forwards them to the MUSER process providing device emulation over a UNIX domain socket. This does not require any changes to QEMU (4.1.0). Obviously this is a massive hack and is done only for the needs of this PoC.
The result is a fully working PCI device in QEMU (the gpio sample explained in https://github.com/nutanix/muser/blob/master/README.md#running-gpio-pci-idio-16), which is as simple as possible. I've also tested with a much more complicated device emulation, https://github.com/tmakatos/spdk, which provides NVMe device emulation and requires accessing guest memory for DMA, allowing BAR0 to be memory mapped into the guest, using MSI-X interrupts, etc. The changes required in MUSER are fairly small, all that is needed is to introduce a new concept of "transport" to receive requests from a UNIX domain socket instead of the kernel (from a character device) and to send/receive file descriptors for sharing memory and firing interrupts. My experience is that VFIO is so intuitive to use for offloading device emulation from one process to another that makes this feature quite straightforward. There's virtually nothing specific to the kernel in the VFIO API. Therefore I strongly agree with Stefan's suggestion to use it for device offloading when interacting with QEMU. Using 'muser.ko' is still interesting when QEMU is not the client, but if everyone is happy to proceed with the vfio-over-socket alternative the kernel module can become a second-class citizen. (QEMU is, after all, our first and most relevant client.) Next I explain how to test the PoC. Build MUSER with vfio-over-socket: git clone --single-branch --branch vfio-over-socket g...@github.com:tmakatos/muser.git cd muser/ git submodule update --init make Run device emulation, e.g. ./build/dbg/samples/gpio-pci-idio-16 -s <N> Where <N> is an available IOMMU group, essentially the device ID, which must not previously exist in /dev/vfio/. Run QEMU using the vfio wrapper library and specifying the MUSER device: LD_PRELOAD=muser/build/dbg/libvfio/libvfio.so qemu-system-x86_64 \ ... \ -device vfio-pci,sysfsdev=/dev/vfio/<N> \ -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=mem,share=yes,size=1073741824 \ -numa node,nodeid=0,cpus=0,memdev=ram-node0 Bear in mind that since this is just a PoC lots of things can break, e.g. some system call not intercepted etc.