On Tue, Jul 19, 2016 at 02:27:40PM +0200, Roman Pen wrote: > v2: > o For the third patch do not introduce extra member for LinuxAioState > structure, reuse ret == -EINPROGRESS. > > o Add explicit comment which explains why we do not hang if requests > are still pended. > > > This series are intended to reduce completion latencies by two changes: > > 1. QEMU does not use any timeout value for harvesting completed AIO > requests from the ring buffer, thus io_getevents() can be implemented > in userspace (first patch). > > 2. In order to reduce completion latency it makes sense to harvest completed > requests ASAP. Very fast backend device can complete requests just after > submission, so it is worth trying to check ring buffer and peek completed > requests directly after io_submit() has been called (third patch). > > Indeed, the series reduces the completions latencies and increases the > overall throughput, e.g. the following is the percentiles of number of > completed requests at once: > > 1th 10th 20th 30th 40th 50th 60th 70th 80th 90th 99.99th > Before 2 4 42 112 128 128 128 128 128 128 128 > After 1 1 4 14 33 45 47 48 50 51 108 > > That means, that before the third patch is applied the ring buffer is > observed as full (128 requests were consumed at once) in 60% of calls. > > After the third patch is applied the distribution of number of completed > requests is "smoother" and the queue (requests in-flight) is almost never > full. > > The fio read results are the following (write results are almost the > same and are not showed here): > > Before > ------ > job: (groupid=0, jobs=8): err= 0: pid=2227: Tue Jul 19 11:29:50 2016 > Description : [Emulation of Storage Server Access Pattern] > read : io=54681MB, bw=1822.7MB/s, iops=179779, runt= 30001msec > slat (usec): min=172, max=16883, avg=338.35, stdev=109.66 > clat (usec): min=1, max=21977, avg=1051.45, stdev=299.29 > lat (usec): min=317, max=22521, avg=1389.83, stdev=300.73 > clat percentiles (usec): > | 1.00th=[ 346], 5.00th=[ 596], 10.00th=[ 708], 20.00th=[ 852], > | 30.00th=[ 932], 40.00th=[ 996], 50.00th=[ 1048], 60.00th=[ 1112], > | 70.00th=[ 1176], 80.00th=[ 1256], 90.00th=[ 1384], 95.00th=[ 1496], > | 99.00th=[ 1800], 99.50th=[ 1928], 99.90th=[ 2320], 99.95th=[ 2672], > | 99.99th=[ 4704] > bw (KB /s): min=205229, max=553181, per=12.50%, avg=233278.26, > stdev=18383.51 > > After > ------ > job: (groupid=0, jobs=8): err= 0: pid=2220: Tue Jul 19 11:31:51 2016 > Description : [Emulation of Storage Server Access Pattern] > read : io=57637MB, bw=1921.2MB/s, iops=189529, runt= 30002msec > slat (usec): min=169, max=20636, avg=329.61, stdev=124.18 > clat (usec): min=2, max=19592, avg=988.78, stdev=251.04 > lat (usec): min=381, max=21067, avg=1318.42, stdev=243.58 > clat percentiles (usec): > | 1.00th=[ 310], 5.00th=[ 580], 10.00th=[ 748], 20.00th=[ 876], > | 30.00th=[ 908], 40.00th=[ 948], 50.00th=[ 1012], 60.00th=[ 1064], > | 70.00th=[ 1080], 80.00th=[ 1128], 90.00th=[ 1224], 95.00th=[ 1288], > | 99.00th=[ 1496], 99.50th=[ 1608], 99.90th=[ 1960], 99.95th=[ 2256], > | 99.99th=[ 5408] > bw (KB /s): min=212149, max=390160, per=12.49%, avg=245746.04, > stdev=11606.75 > > Throughput increased from 1822MB/s to 1921MB/s, average completion latencies > decreased from 1051us to 988us. > > Roman Pen (3): > linux-aio: consume events in userspace instead of calling io_getevents > linux-aio: split processing events function > linux-aio: process completions from ioq_submit() > > block/linux-aio.c | 178 > ++++++++++++++++++++++++++++++++++++++++++------------ > 1 file changed, 141 insertions(+), 37 deletions(-) > > Signed-off-by: Roman Pen <[email protected]> > Cc: Stefan Hajnoczi <[email protected]> > Cc: Paolo Bonzini <[email protected]> > Cc: [email protected]
Thanks, applied to my block-next tree for QEMU 2.8: https://github.com/stefanha/qemu/commits/block-next Stefan
signature.asc
Description: PGP signature
