On 2/2/26 11:25 AM, Pierrick Bouvier wrote:
On 2/2/26 10:52 AM, Stefan Hajnoczi wrote:

This command-line lets you benchmark virtio-blk without actual I/O
slowing down the request processing:

    qemu-system-x86_64 \
        -M accel=kvm \
        -cpu host \
        -m 4G \
        --blockdev 
file,node-name=drive0,filename=boot.img,cache.direct=on,aio=native \
        --blockdev null-co,node-name=drive1,size=$((10 * 1024 * 1024 * 1024)) \
        --object iothread,id=iothread0 \
        --device virtio-blk-pci,drive=drive0,iothread=iothread0 \
        --device virtio-blk-pci,drive=drive1,iothread=iothread0

Here is a fio command-line for 4 KiB random reads:

    fio \
        --ioengine=libaio \
        --direct=1 \
        --runtime=30 \
        --ramp_time=10 \
        --rw=randread \
        --bs=4k \
        --iodepth=128 \
        --filename=/dev/vdb \
        --name=randread

This is just a single vCPU, but it should be enough to see if there is
any difference in I/O Operations Per Second (IOPS) or efficiency
(IOPS/CPU utilization).

Thanks very much for the info Stefan. I didn't even know null-co
blockdev, so it definitely would have taken some time to find all this.

For what it's worth, I automated the benchmark here (need podman for
build), so it can be reused for future changes:
https://github.com/pbo-linaro/qemu-linux-stack/tree/x86_64_io_benchmark

./build.sh && ./run.sh path/to/qemu-system-x86_64

My initial testing showed a 50% slow down, which was more than
surprising. After profiling, the extra time is spent here:
https://github.com/qemu/qemu/blob/587f4a1805c83a4e1d59dd43cb14e0a834843d1d/target-info.c#L30

When we merged target-info, there have been several versions over a long
time, and I was 100% sure we were updating the target_info structure,
instead of reparsing the target_name every time. Unfortunately, that's
not the case. I'll fix that.

With that fix, there is no difference in performance (<1%).

I'll respin a v4 with the target info fix, initial v1 changes and
benchmark results.

Thanks for pointing the performance issue, there was one for sure.

Regards,
Pierrick

After proper benchmarking, I get those results (in kIOPS) over 20 runs, all including target_arch fix I mentioned.

reference: mean=239.2 var=12.16
v1: mean=232.2 var=37.06
v4-wip (optimized virtio_access_is_big_endian): mean=235.05 var=18.64

So basically, we have a 1.7% performance hit on a torture benchmark.
Is that acceptable for you, or should we dig more?

Regards,
Pierrick

Reply via email to