On Mon, Feb 11, 2013 at 1:42 PM, Paolo Bonzini <[email protected]> wrote: > Il 11/02/2013 13:29, Kevin Wolf ha scritto: >> The bug is not in this function but in process_incoming_migration(). It >> should never blindly enter a coroutine which is in an unknown state. > > process_incoming_migration() is the function that _creates_ the > coroutine and enters it for the first time, so there shouldn't be any > problem there. > > The function we're looking at should be, as you write correctly, > enter_migration_coroutine(). > >> If it can reenter here, it can reenter anywhere in block drivers, and >> adding a workaround to one place that just yields again if it got an >> early reentrance doesn't fix the real bug. >> >> Which is the yield that corresponds to the enter in >> enter_migration_coroutine()? We need to add some state that can be used >> to make sure the enter happens only for this specific yield. > > In this case the corresponding yield is in qemu-coroutine-io.c, and it > is driven by read-availability of the migration file descriptor. So it > is possible to test before entering the coroutine, but really ugly (not > just because it would cost one extra system call). > > But why is enter_migration_coroutine() being called at all? bdrv_write > should be a synchronous call, it should run its own qemu_aio_wait() > event loop and that loop doesn't include the migration file descriptor.
qemu_aio_wait() is not called because we are already in a coroutine - the migration coroutine. bdrv_write() yields instead of looping on qemu_aio_wait(). If we want coroutine_fn to be composable, they must handle spurious entries. We could add code to clear the socket fd handler while we're not waiting for it but I think handling spurious entries is the most robust solution. Stefan
