On Tue, Dec 15, 2020 at 04:34:05PM +0100, Kevin Wolf wrote: > Am 14.12.2020 um 18:05 hat Sergio Lopez geschrieben: > > There's a cross-dependency between closing the block exports and > > draining the block layer. The latter needs that we close all export's > > client connections to ensure they won't queue more requests, but the > > exports may have coroutines yielding in the block layer, which implies > > they can't be fully closed until we drain it. > > A coroutine that yielded must have some way to be reentered. So I guess > the quesiton becomes why they aren't reentered until drain. We do > process events: > > AIO_WAIT_WHILE(NULL, blk_exp_has_type(type)); > > So in theory, anything that would finalise the block export closing > should still execute. > > What is the difference that drain makes compared to a simple > AIO_WAIT_WHILE, so that coroutine are reentered during drain, but not > during AIO_WAIT_WHILE? > > This is an even more interesting question because the NBD server isn't a > block node nor a BdrvChildClass implementation, so it shouldn't even > notice a drain operation.
OK, took a deeper dive into the issue. While shutting down the guest,
some co-routines from the NBD server are stuck here:
nbd_trip
nbd_handle_request
nbd_do_cmd_read
nbd_co_send_sparse_read
blk_pread
blk_prw
blk_read_entry
blk_do_preadv
blk_wait_while_drained
qemu_co_queue_wait
This happens because bdrv_close_all() is called after
bdrv_drain_all_begin(), so all block backends are quiesced.
An alternative approach to this patch would be moving
blk_exp_close_all() to vl.c:qemu_cleanup, before
bdrv_drain_all_begin().
Do you have a preference for one of these options?
Thanks,
Sergio.
signature.asc
Description: PGP signature
