On Fri, Sep 27, 2024 at 3:55 AM Peter Xu <[email protected]> wrote:

> On Fri, Sep 27, 2024 at 02:13:47AM +0800, Yong Huang wrote:
> > On Thu, Sep 26, 2024 at 3:17 AM Peter Xu <[email protected]> wrote:
> >
> > > On Fri, Sep 20, 2024 at 10:43:31AM +0800, Yong Huang wrote:
> > > > Yes, invoke migration_bitmap_sync_precopy more frequently is also my
> > > > first idea but it involves bitmap updating and interfere with the
> > > behavior
> > > > of page sending, it also affects the migration information stats and
> > > > interfere other migration logic such as migration_update_rates().
> > >
> > > Could you elaborate?
> > >
> > > For example, what happens if we start to sync in ram_save_iterate() for
> > > some time intervals (e.g. 5 seconds)?
> > >
> >
> > I didn't try to sync in ram_save_iterate but in the
> > migration_bitmap_sync_precopy.
> >
> > If we use the migration_bitmap_sync_precopy in the ram_save_iterate
> > function,
> > This approach seems to be correct. However, the bitmap will be updated as
> > the
> > migration thread iterates through each dirty page in the RAMBlock list.
> > Compared
> > to the existing implementation, this is different but still
> straightforward;
> > I'll give it a shot soon to see if it works.
>
> It's still serialized in the migration thread, so I'd expect it is similar
>

What does "serialized" mean?

How about we:
1. invoke the migration_bitmap_sync_precopy in a timer(bg_sync_timer) hook,
   every 5 seconds.
2. register the bg_sync_timer in the main loop when the machine starts like
    throttle_timer
3. activate the timer when ram_save_iterate gets called and deactivate it in
    the ram_save_cleanup gracefully during migration.

I think it is simple enough and also isn't "serialized"?

to e.g. ->state_pending_exact() calls when QEMU flushed most dirty pages in
> the current bitmap.
>
> >
> >
> > > Btw, we shouldn't have this extra sync exist if auto converge is
> disabled
> > > no matter which way we use, because it's pure overhead when auto
> converge
> > > is not in use.
> > >
> >
> > Ok, I'll add the check in the next versioni.
>
> Let's start with simple, and if there's anything unsure we can discuss
> upfront, just to avoid coding something and change direction later.  Again,
> personally I think we shouldn't add too much new code to auto converge
> (unless very well justfied, but I think it's just hard.. fundamentally with
> any pure throttling solutions), hopefully something small can make it start
> to work for huge VMs.
>
> Thanks,
>
> --
> Peter Xu
>
>

-- 
Best regards

Reply via email to