Am 03.09.2025 um 20:11 hat Brian Song geschrieben: > > > On 9/3/25 5:49 AM, Stefan Hajnoczi wrote: > > On Sat, Aug 30, 2025 at 08:00:00AM -0400, Brian Song wrote: > > > We used fio to test a 1 GB file under both traditional FUSE and > > > FUSE-over-io_uring modes. The experiments were conducted with the > > > following iodepth and numjobs configurations: 1-1, 64-1, 1-4, and 64-4, > > > with 70% read and 30% write, resulting in a total of eight test cases, > > > measuring both latency and throughput. > > > > > > Test results: > > > > > > https://gist.github.com/hibriansong/a4849903387b297516603e83b53bbde4 > > > > Hanna: You benchmarked the FUSE export coroutine implementation a little > > while ago. What do you think about these results with > > FUSE-over-io_uring? > > > > What stands out to me is that iodepth=1 numjobs=4 already saturates the > > system, so increasing iodepth to 64 does not improve the results much. > > > > Brian: What is the qemu-storage-daemon command-line for the benchmark > > and what are the details of /mnt/tmp/ (e.g. a preallocated 10 GB file > > with an XFS file system mounted from the FUSE image)? > > QMP script: > https://gist.github.com/hibriansong/399f9564a385cfb94db58669e63611f8 > > Or: > ### NORMAL > ./qemu/build/storage-daemon/qemu-storage-daemon \ > --object iothread,id=iothread1 \ > --object iothread,id=iothread2 \ > --object iothread,id=iothread3 \ > --object iothread,id=iothread4 \ > --blockdev node-name=prot-node,driver=file,filename=ubuntu.qcow2 \
This uses the default AIO and most importantly cache mode, which means that the host kernel page cache is used. This makes it hard to tell how much it accessed just RAM on the host and how much really went to the disk, so the results are difficult to interpret correctly. For benchmarks, it's generally best to use cache.direct=on and I think I'd also prefer aio=native (or aio=io_uring). > --blockdev node-name=fmt-node,driver=qcow2,file=prot-node \ > --export > type=fuse,id=exp0,node-name=fmt-node,mountpoint=mount-point,writable=on,iothread.0=iothread1,iothread.1=iothread2,iothread.2=iothread3,iothread.3=iothread4 > > ### URING > echo Y > /sys/module/fuse/parameters/enable_uring > > ./qemu/build/storage-daemon/qemu-storage-daemon \ > --object iothread,id=iothread1 \ > --object iothread,id=iothread2 \ > --object iothread,id=iothread3 \ > --object iothread,id=iothread4 \ > --blockdev node-name=prot-node,driver=file,filename=ubuntu.qcow2 \ > --blockdev node-name=fmt-node,driver=qcow2,file=prot-node \ > --export > type=fuse,id=exp0,node-name=fmt-node,mountpoint=mount-point,writable=on,io-uring=on,iothread.0=iothread1,iothread.1=iothread2,iothread.2=iothread3,iothread.3=iothread4 > > ubuntu.qcow2 has been prealloacted and enlarge the space to 100GB by > > $ qemu-img resize ubuntu.qcow2 100G I think this doesn't preallocate the newly added space, you should add --preallocation=falloc at least. > $ virt-customize \ > --run-command '/bin/bash /bin/growpart /dev/sda 1' \ > --run-command 'resize2fs /dev/sda1' -a ubuntu.qcow2 > > The image file, formatted with an Ext4 filesystem, was mounted on /mnt/tmp > on my PC equipped with a Kingston PCIe 4.0 NVMe SSD > > $ sudo kpartx -av mount-point > $ sudo mount /dev/mapper/loop31p1 /mnt/tmp/ > > > Unmount the partition after done using it. > > $ sudo umount /mnt/tmp > # sudo kpartx -dv mount-point What I would personally use to benchmark performance is just a clean preallocated raw image without a guest on it. I wouldn't even partition it or necessarily put a filesystem on it, but just run the benchmark directly on the FUSE export's mountpoint. The other thing I'd consider for benchmarking is the null-co block driver so that the FUSE overhead really dominates and isn't dwarved by a slow disk. (A null block device is where you can't have a filesystem even if you wanted.) Kevin > > > On 8/29/25 10:50 PM, Brian Song wrote: > > > > Hi all, > > > > > > > > This is a GSoC project. More details are available here: > > > > https://wiki.qemu.org/Google_Summer_of_Code_2025#FUSE-over-io_uring_exports > > > > > > > > This patch series includes: > > > > - Add a round-robin mechanism to distribute the kernel-required Ring > > > > Queues to FUSE Queues > > > > - Support multiple in-flight requests (multiple ring entries) > > > > - Add tests for FUSE-over-io_uring > > > > > > > > More detail in the v2 cover letter: > > > > https://lists.nongnu.org/archive/html/qemu-block/2025-08/msg00140.html > > > > > > > > And in the v1 cover letter: > > > > https://lists.nongnu.org/archive/html/qemu-block/2025-07/msg00280.html > > > > > > > > > > > > Brian Song (4): > > > > export/fuse: add opt to enable FUSE-over-io_uring > > > > export/fuse: process FUSE-over-io_uring requests > > > > export/fuse: Safe termination for FUSE-uring > > > > iotests: add tests for FUSE-over-io_uring > > > > > > > > block/export/fuse.c | 838 > > > > +++++++++++++++++++++------ > > > > docs/tools/qemu-storage-daemon.rst | 11 +- > > > > qapi/block-export.json | 5 +- > > > > storage-daemon/qemu-storage-daemon.c | 1 + > > > > tests/qemu-iotests/check | 2 + > > > > tests/qemu-iotests/common.rc | 45 +- > > > > util/fdmon-io_uring.c | 5 +- > > > > 7 files changed, 717 insertions(+), 190 deletions(-) > > > > > > > >
