On Fri, Sep 27, 2024 at 3:55 AM Peter Xu <[email protected]> wrote:
> On Fri, Sep 27, 2024 at 02:13:47AM +0800, Yong Huang wrote: > > On Thu, Sep 26, 2024 at 3:17 AM Peter Xu <[email protected]> wrote: > > > > > On Fri, Sep 20, 2024 at 10:43:31AM +0800, Yong Huang wrote: > > > > Yes, invoke migration_bitmap_sync_precopy more frequently is also my > > > > first idea but it involves bitmap updating and interfere with the > > > behavior > > > > of page sending, it also affects the migration information stats and > > > > interfere other migration logic such as migration_update_rates(). > > > > > > Could you elaborate? > > > > > > For example, what happens if we start to sync in ram_save_iterate() for > > > some time intervals (e.g. 5 seconds)? > > > > > > > I didn't try to sync in ram_save_iterate but in the > > migration_bitmap_sync_precopy. > > > > If we use the migration_bitmap_sync_precopy in the ram_save_iterate > > function, > > This approach seems to be correct. However, the bitmap will be updated as > > the > > migration thread iterates through each dirty page in the RAMBlock list. > > Compared > > to the existing implementation, this is different but still > straightforward; > > I'll give it a shot soon to see if it works. > > It's still serialized in the migration thread, so I'd expect it is similar > What does "serialized" mean? How about we: 1. invoke the migration_bitmap_sync_precopy in a timer(bg_sync_timer) hook, every 5 seconds. 2. register the bg_sync_timer in the main loop when the machine starts like throttle_timer 3. activate the timer when ram_save_iterate gets called and deactivate it in the ram_save_cleanup gracefully during migration. I think it is simple enough and also isn't "serialized"? to e.g. ->state_pending_exact() calls when QEMU flushed most dirty pages in > the current bitmap. > > > > > > > > Btw, we shouldn't have this extra sync exist if auto converge is > disabled > > > no matter which way we use, because it's pure overhead when auto > converge > > > is not in use. > > > > > > > Ok, I'll add the check in the next versioni. > > Let's start with simple, and if there's anything unsure we can discuss > upfront, just to avoid coding something and change direction later. Again, > personally I think we shouldn't add too much new code to auto converge > (unless very well justfied, but I think it's just hard.. fundamentally with > any pure throttling solutions), hopefully something small can make it start > to work for huge VMs. > > Thanks, > > -- > Peter Xu > > -- Best regards
