On Thursday, July 1, 2021 4:08 AM, Peter Xu wrote:
> Taking the mutex every time for each dirty bit to clear is too slow,
> especially we'll
> take/release even if the dirty bit is cleared. So far it's only used to sync
> with
> special cases with qemu_guest_free_page_hint() against migration thread,
> nothing really that serious yet. Let's move the lock to be upper.
>
> There're two callers of migration_bitmap_clear_dirty().
>
> For migration, move it into ram_save_iterate(). With the help of MAX_WAIT
> logic, we'll only run ram_save_iterate() for no more than 50ms-ish time, so
> taking
> the lock once there at the entry. It also means any call sites to
> qemu_guest_free_page_hint() can be delayed; but it should be very rare, only
> during migration, and I don't see a problem with it.
>
> For COLO, move it up to colo_flush_ram_cache(). I think COLO forgot to take
> that lock even when calling ramblock_sync_dirty_bitmap(), where another
> example is migration_bitmap_sync() who took it right. So let the mutex cover
> both the
> ramblock_sync_dirty_bitmap() and migration_bitmap_clear_dirty() calls.
>
> It's even possible to drop the lock so we use atomic operations upon rb->bmap
> and the variable migration_dirty_pages. I didn't do it just to still be
> safe, also
> not predictable whether the frequent atomic ops could bring overhead too e.g.
> on huge vms when it happens very often. When that really comes, we can
> keep a local counter and periodically call atomic ops. Keep it simple for
> now.
>
If free page opt is enabled, 50ms waiting time might be too long for handling
just one hint (via qemu_guest_free_page_hint)?
How about making the lock conditionally?
e.g.
#define QEMU_LOCK_GUARD_COND (lock, cond) {
if (cond)
QEMU_LOCK_GUARD(lock);
}
Then in migration_bitmap_clear_dirty:
QEMU_LOCK_GUARD_COND(&rs->bitmap_mutex, rs->fpo_enabled);
Best,
Wei