Fiona Ebner <[email protected]> wrote:
> The bdrv_create_dirty_bitmap() function (which is also called by
> bdrv_dirty_bitmap_create_successor()) uses bdrv_getlength(bs). This is
> a wrapper around a coroutine, and when not called in coroutine context
> would use bdrv_poll_co(). Such a call would trigger an assert() if the
> correct AioContext hasn't been acquired before, because polling would
> try to release the AioContext.
The ingenous in me thinks:
If the problem is that bdrv_poll_co() release an AioContext that it
don't have acquired, perhaps we should fix bdrv_poll_co().
Ha!!!
$ find . -type f -exec grep --color=auto -nH --null -e bdrv_poll_co \{\} +
./scripts/block-coroutine-wrapper.py\0173:
bdrv_poll_co(&s.poll_state);
./scripts/block-coroutine-wrapper.py\0198:
bdrv_poll_co(&s.poll_state);
./block/block-gen.h\038:static inline void bdrv_poll_co(BdrvPollCo *s)
$
/me retreats
> The issue does not happen for migration, because the call happens
> from process_incoming_migration_co(), i.e. in coroutine context. So
> the bdrv_getlength() wrapper will just call bdrv_co_getlength()
> directly without polling.
The ingenous in me wonders why bdrv_getlength() needs to use coroutines
at all, but as I have been burned on the previous paragraph, I learn not
to even try.
Ok, I never learn, so I do a grep and I see two appearces of
bdrv_getlength in include files, but grep only shows uses of the
function, not a real definition.
I even see
co_wrapper_mixed_bdrv_rdlock
And decied to stop here.
> The issue would happen for snapshots, but won't in practice, because
> saving a snapshot with a block dirty bitmap is currently not possible.
> The reason is that dirty_bitmap_save_iterate() returns whether it has
> completed the bulk phase, which only happens in postcopy, so
> qemu_savevm_state_iterate() will always return 0, meaning the call
> to iterate will be repeated over and over again without ever reaching
> the completion phase.
>
> Still, this would make the code more robust for the future.
What I wonder is if we should annotate this calls somehow that they need
to be called from a coroutine context, because I would have never found
it.
And just thinking loud:
I still wonder if we should make incoming migration its own thread.
Because we got more and more problems because it is a coroutine, that in
non-multifd case can consume a whole CPU alone, so it makes no sense to
have a coroutine.
On the other hand, with multifd, it almost don't use any resources, so ....
> Signed-off-by: Fiona Ebner <[email protected]>
> ---
>
> We ran into this issue downstream, because we have a custom snapshot
> mechanism which does support dirty bitmaps and does not run in
> coroutine context during load.
>
> migration/block-dirty-bitmap.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index 032fc5f405..e1ae3b7316 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -805,8 +805,11 @@ static int dirty_bitmap_load_start(QEMUFile *f,
> DBMLoadState *s)
> "destination", bdrv_dirty_bitmap_name(s->bitmap));
> return -EINVAL;
> } else {
> + AioContext *ctx = bdrv_get_aio_context(s->bs);
> + aio_context_acquire(ctx);
> s->bitmap = bdrv_create_dirty_bitmap(s->bs, granularity,
> s->bitmap_name, &local_err);
> + aio_context_release(ctx);
> if (!s->bitmap) {
> error_report_err(local_err);
> return -EINVAL;
I was going to suggest to put that code inside brdv_create_dirty_bitmap,
because migration come is already complex enough without knowing about
AioContext, but it appears that this pattern is already used in several
places.
> @@ -833,7 +836,10 @@ static int dirty_bitmap_load_start(QEMUFile *f,
> DBMLoadState *s)
>
> bdrv_disable_dirty_bitmap(s->bitmap);
> if (flags & DIRTY_BITMAP_MIG_START_FLAG_ENABLED) {
> + AioContext *ctx = bdrv_get_aio_context(s->bs);
> + aio_context_acquire(ctx);
> bdrv_dirty_bitmap_create_successor(s->bitmap, &local_err);
> + aio_context_release(ctx);
> if (local_err) {
> error_report_err(local_err);
> return -EINVAL;
for(i=1; i < 1000; i++)
printf(""I will not grep what a successor of a dirty bitmap
is\n");
Later, Juan.