On Thu, Sep 05, 2013 at 10:18:28AM +0900, Jonghwan Choi wrote: Thanks for posting these details.
Have you tried running x-data-plane=off with vcpu = 8 and how does the performance compare to x-data-plane=off with vcpu = 1? > > 1. The fio results so it's clear which cases performed worse and by how > > much. > > > When I set vcpu = 8, read performance is decreased about 25%. > In my test, when vcpu = 1, I got a best formance. Performance with vcpu = 8 is 25% worse than performance with vcpu = 1? Can you try pinning threads to host CPUs? See libvirt emulatorpin and vcpupin attributes: http://libvirt.org/formatdomain.html#elementsCPUTuning > > 2. The fio job files. > > > [testglobal] > description=high_iops > exec_prerun="echo 3 > /proc/sys/vm/drop_caches" > group_reporting=1 > rw=read > direct=1 > ioengine=sync > bs=4m > numjobs=1 > size=2048m A couple of points to check: 1. This test case is synchronous and latency-sensitive, you are not benchmarking parallel I/Os so x-data-plane=on is not expected to perform any better than x-data-plane=off. The point of x-data-plane=on is for smp > 1 guests with parallel I/O to scale well. If both those conditions are not met by the workload then I don't expect you to see any gains over x-data-plane=off. If you want to try parallel I/Os, I suggest using: ioengine=linuxaio iodepth=16 2. size=2048m with bs=4m on an SSD drive seems quite small because the test would complete quickly. What is the overall running time of this test? In order to collect stable results it's usually a good idea for the test to run for a couple of minutes (e.g. 2 minutes minimum). Otherwise outliers can influence the results too much. You may need to increase 'size' or use the 'runtime=2m' option instead. Stefan
