On Tue, Jul 8, 2014 at 9:07 PM, Christian Borntraeger <borntrae...@de.ibm.com> wrote: > On 08/07/14 19:08, Paolo Bonzini wrote: >> Il 08/07/2014 17:59, Stefan Hajnoczi ha scritto: >>> I sent Christian an initial patch to fix this but now both threads are >>> stuck in rfifolock_lock() inside cond wait. That's very strange and >>> should never happen. >> >> I had this patch pending for 2.2: >> >> commit 6c81e31615c3cda5ea981a998ba8b1b8ed17de6f >> Author: Paolo Bonzini <pbonz...@redhat.com> >> Date: Mon Jul 7 10:39:49 2014 +0200 >> >> iothread: do not rely on aio_poll(ctx, true) result to end a loop >> >> Currently, whenever aio_poll(ctx, true) has completed all pending >> work it returns true *and* the next call to aio_poll(ctx, true) >> will not block. >> >> This invariant has its roots in qemu_aio_flush()'s implementation >> as "while (qemu_aio_wait()) {}". However, qemu_aio_flush() does >> not exist anymore and bdrv_drain_all() is implemented differently; >> and this invariant is complicated to maintain and subtly different >> from the return value of GMainLoop's g_main_context_iteration. >> >> All calls to aio_poll(ctx, true) except one are guarded by a >> while() loop checking for a request to be incomplete, or a >> BlockDriverState to be idle. Modify that one exception in >> iothread.c. >> >> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com> > > The hangs are gone. Looks like 2.1 material now... > > Acked-by: Christian Borntraeger <borntrae...@de.ibm.com> > Tested-by: Christian Borntraeger <borntrae...@de.ibm.com> > > > > >> >> diff --git a/iothread.c b/iothread.c >> index 1fbf9f1..d9403cf 100644 >> --- a/iothread.c >> +++ b/iothread.c >> @@ -30,6 +30,7 @@ typedef ObjectClass IOThreadClass; >> static void *iothread_run(void *opaque) >> { >> IOThread *iothread = opaque; >> + bool blocking; >> >> qemu_mutex_lock(&iothread->init_done_lock); >> iothread->thread_id = qemu_get_thread_id(); >> @@ -38,8 +39,10 @@ static void *iothread_run(void *opaque) >> >> while (!iothread->stopping) { >> aio_context_acquire(iothread->ctx); >> - while (!iothread->stopping && aio_poll(iothread->ctx, true)) { >> + blocking = true; >> + while (!iothread->stopping && aio_poll(iothread->ctx, blocking)) { >> /* Progress was made, keep going */ >> + blocking = false; >> } >> aio_context_release(iothread->ctx); >> } >> >> Christian, can you test it?
Could affect performance because of the extra poll/release/acquire but a clean solution for broken iothread_run(). Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com> Good for 2.1