David Hildenbrand <[email protected]> wrote: > Resizing while migrating is dangerous and does not work as expected. > The whole migration code works on the usable_length of ram blocks and does > not expect this to change at random points in time. > > Precopy: The ram block size must not change on the source, after > ram_save_setup(), so as long as the guest is still running on the source. > > Postcopy: The ram block size must not change on the target, after > synchronizing the RAM block list (ram_load_precopy()). > > AFAIKS, resizing can be trigger *after* (but not during) a reset in > ACPI code by the guest > - hw/arm/virt-acpi-build.c:acpi_ram_update() > - hw/i386/acpi-build.c:acpi_ram_update() > > I see no easy way to work around this. Fail hard instead of failing > somewhere in migration code due to strange other reasons. AFAIKs, the > rebuilts will be triggered during reboot, so this should not affect > running guests, but only guests that reboot at a very bad time and > actually require size changes. > > Let's further limit the impact by checking if an actual resize of the > RAM (in number of pages) is required. > > Don't perform the checks in qemu_ram_resize(), as that's called during > migration when syncing the used_length. Update documentation. > > Cc: "Dr. David Alan Gilbert" <[email protected]> > Cc: Eduardo Habkost <[email protected]> > Cc: Paolo Bonzini <[email protected]> > Cc: Igor Mammedov <[email protected]> > Cc: "Michael S. Tsirkin" <[email protected]> > Cc: Richard Henderson <[email protected]> > Cc: Shannon Zhao <[email protected]> > Cc: Alex Bennée <[email protected]> > Cc: Shameerali Kolothum Thodi <[email protected]> > Cc: Juan Quintela <[email protected]> > Signed-off-by: David Hildenbrand <[email protected]> > ---
> > Any idea how to avoid killing the guest? Anything obvious I am missing? If you avoid the resize, it should be ok for both precopy & postcopy. But, as you point, if acpi guest is the one changing sizes, we are in trouble. But really, it makes exactly zero sense to reset during migrate. if we _could_ catch the reset, the "intelligent" thing to do is: - detect reset - launch guest on destination from zero. I.e. not migration at all. This would be my "better" idea, but I have no clue how to catch that kind of things in a sane way that works in every architecture. You get the: Reviewed-by: Juan Quintela <[email protected]> because: - your code change makes sense - the documentation update is good. Thanks, Juan.
