Paolo Bonzini <[email protected]> writes: > Il 09/10/2012 20:26, Anthony Liguori ha scritto: >> Paolo Bonzini <[email protected]> writes: >> >>> Il 09/10/2012 17:37, Anthony Liguori ha scritto: >>>>>>>> In the very short term, I can imagine an aio fastpath that was only >>>>>>>> implemented in terms of the device API. We could have a slow path that >>>>>>>> acquired the BQL. >>>>>> >>>>>> Not sure I follow. >>>> >>>> As long as the ioeventfd thread can acquire qemu_mutex in order to call >>>> bdrv_* functions. The new device-only API could do this under the >>>> covers for everything but the linux-aio fast path initially. >>> >>> Ok, so it's about the locking. I'm not even sure we need locking if we >>> have cooperative multitasking. For example if bdrv_aio_readv/writev >>> is called from a VCPU thread, it can just schedule a bottom half for >>> itself in the appropriate AioContext. Similarly for block jobs. >> >> Okay, let's separate out the two issues here though. One is whether we >> need a device specific block API. The second is whether we should short >> cut to a fast path in the short term and go after a fully unlocked bdrv_ >> layer in the long(shortish?) term. >> >> So let's talk about your proposal... >> >>> The only part where I'm not sure how it would work is bdrv_read/write, >>> because of the strange "qemu_aio_wait() calls select with a lock taken". >>> Maybe we can just forbid synchronous I/O if you set a non-default >>> AioContext. >> >> Not sure how practical that is. The is an awful lot of sync I/O still left. > > Hmm, yeah, perhaps we need to bite the bullet and use a recursive lock. > The lock would be taken by: > > - sync I/O ops > > - monitor commands that currently call bdrv_drain_all > > - aio_poll when calling bottom halves or handlers > > The rest of the proposal however would stand (especially with reference > to block jobs). > > I think we can proceed incrementally. The first obvious step is to > s/qemu_bh_new/aio_bh_new/ in the whole block layer (including the > CoQueue stuff), which would also help fixing the qemu-char bug that Jan > reported. > >>> This would be entirely hidden in the block layer. For example the >>> following does it for bdrv_aio_readv/writev: >>> >>> diff --git a/block.c b/block.c >>> index e95f613..7165e82 100644 >>> --- a/block.c >>> +++ b/block.c >>> @@ -3712,15 +3712,6 @@ static AIOPool bdrv_em_co_aio_pool = { >>> .cancel = bdrv_aio_co_cancel_em, >>> }; >>> >>> -static void bdrv_co_em_bh(void *opaque) >>> -{ >>> - BlockDriverAIOCBCoroutine *acb = opaque; >>> - >>> - acb->common.cb(acb->common.opaque, acb->req.error); >>> - qemu_bh_delete(acb->bh); >>> - qemu_aio_release(acb); >>> -} >>> - >>> /* Invoke bdrv_co_do_readv/bdrv_co_do_writev */ >>> static void coroutine_fn bdrv_co_do_rw(void *opaque) >>> { >>> @@ -3735,8 +3726,17 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque) >>> acb->req.nb_sectors, acb->req.qiov, 0); >>> } >>> >>> - acb->bh = qemu_bh_new(bdrv_co_em_bh, acb); >>> - qemu_bh_schedule(acb->bh); >>> + acb->common.cb(acb->common.opaque, acb->req.error); >>> + qemu_aio_release(acb); >>> +} >>> + >>> +static void bdrv_co_em_bh(void *opaque) >>> +{ >>> + BlockDriverAIOCBCoroutine *acb = opaque; >>> + >>> + qemu_bh_delete(acb->bh); >>> + co = qemu_coroutine_create(bdrv_co_do_rw); >>> + qemu_coroutine_enter(co, acb); >>> } >>> >>> static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs, >>> @@ -3756,8 +3756,8 @@ static BlockDriverAIOCB >>> *bdrv_co_aio_rw_vector(BlockDriverState *bs, >>> acb->req.qiov = qiov; >>> acb->is_write = is_write; >>> >>> - co = qemu_coroutine_create(bdrv_co_do_rw); >>> - qemu_coroutine_enter(co, acb); >>> + acb->bh = qemu_bh_new(bdrv_co_em_bh, acb); >>> + qemu_bh_schedule(acb->bh); >>> >>> return &acb->common; >>> } >>> >>> >>> Then we can add a bdrv_aio_readv/writev_unlocked API to the protocols, which >>> would run outside the bottom half and provide the desired fast path. >> >> This works for some of the block layer I think. How does this interact >> with thread pools for AIO? >> >> But this wouldn't work well with things like NBD or curl, right? What's >> the plan there? > > NBD uses coroutines; curl can use the non-unlocked > bdrv_aio_readv/writev. In both cases they would execute in the > dataplane thread. qcow2-over-raw would also execute its read/write code > entirely from the dataplane thread, for example.
Does that mean that we'd stop processing the queue if we're waiting for an I/O completion to handle meta data operations? If that's the case, that probably will hurt performance overall. Regards, Anthony Liguori > > Paolo
