On 2/2/26 11:25 AM, Pierrick Bouvier wrote:
On 2/2/26 10:52 AM, Stefan Hajnoczi wrote:This command-line lets you benchmark virtio-blk without actual I/O slowing down the request processing: qemu-system-x86_64 \ -M accel=kvm \ -cpu host \ -m 4G \ --blockdev file,node-name=drive0,filename=boot.img,cache.direct=on,aio=native \ --blockdev null-co,node-name=drive1,size=$((10 * 1024 * 1024 * 1024)) \ --object iothread,id=iothread0 \ --device virtio-blk-pci,drive=drive0,iothread=iothread0 \ --device virtio-blk-pci,drive=drive1,iothread=iothread0 Here is a fio command-line for 4 KiB random reads: fio \ --ioengine=libaio \ --direct=1 \ --runtime=30 \ --ramp_time=10 \ --rw=randread \ --bs=4k \ --iodepth=128 \ --filename=/dev/vdb \ --name=randread This is just a single vCPU, but it should be enough to see if there is any difference in I/O Operations Per Second (IOPS) or efficiency (IOPS/CPU utilization).Thanks very much for the info Stefan. I didn't even know null-co blockdev, so it definitely would have taken some time to find all this. For what it's worth, I automated the benchmark here (need podman for build), so it can be reused for future changes: https://github.com/pbo-linaro/qemu-linux-stack/tree/x86_64_io_benchmark ./build.sh && ./run.sh path/to/qemu-system-x86_64 My initial testing showed a 50% slow down, which was more than surprising. After profiling, the extra time is spent here: https://github.com/qemu/qemu/blob/587f4a1805c83a4e1d59dd43cb14e0a834843d1d/target-info.c#L30 When we merged target-info, there have been several versions over a long time, and I was 100% sure we were updating the target_info structure, instead of reparsing the target_name every time. Unfortunately, that's not the case. I'll fix that. With that fix, there is no difference in performance (<1%). I'll respin a v4 with the target info fix, initial v1 changes and benchmark results. Thanks for pointing the performance issue, there was one for sure. Regards, Pierrick
After proper benchmarking, I get those results (in kIOPS) over 20 runs, all including target_arch fix I mentioned.
reference: mean=239.2 var=12.16 v1: mean=232.2 var=37.06 v4-wip (optimized virtio_access_is_big_endian): mean=235.05 var=18.64 So basically, we have a 1.7% performance hit on a torture benchmark. Is that acceptable for you, or should we dig more? Regards, Pierrick
