Just because we're in a coroutine doesn't imply ownership of the context of the flushed drive. In such a case use the slow path which explicitly enters bdrv_flush_co_entry in the correct AioContext.
Signed-off-by: Stefan Reiter <s.rei...@proxmox.com> --- We've experienced some lockups in this codepath when taking snapshots of VMs with drives that have IO-Threads enabled (we have an async 'savevm' implementation running from a coroutine). Currently no reproducer for upstream versions I could find, but in testing this patch fixes all issues we're seeing and I think the logic checks out. The fast path pattern is repeated a few times in this file, so if this change makes sense, it's probably worth evaluating the other occurences as well. block/io.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/block/io.c b/block/io.c index aba67f66b9..ee7310fa13 100644 --- a/block/io.c +++ b/block/io.c @@ -2895,8 +2895,9 @@ int bdrv_flush(BlockDriverState *bs) .ret = NOT_DONE, }; - if (qemu_in_coroutine()) { - /* Fast-path if already in coroutine context */ + if (qemu_in_coroutine() && + bdrv_get_aio_context(bs) == qemu_get_current_aio_context()) { + /* Fast-path if already in coroutine and we own the drive's context */ bdrv_flush_co_entry(&flush_co); } else { co = qemu_coroutine_create(bdrv_flush_co_entry, &flush_co); -- 2.20.1