On 14.02.20 12:02, Dr. David Alan Gilbert wrote: > * David Hildenbrand ([email protected]) wrote: >> On 14.02.20 11:42, Dr. David Alan Gilbert wrote: >>> * David Hildenbrand ([email protected]) wrote: >>>> On 14.02.20 11:25, Dr. David Alan Gilbert wrote: >>>>> * David Hildenbrand ([email protected]) wrote: >>>>>> Resizing while migrating is dangerous and does not work as expected. >>>>>> The whole migration code works on the usable_length of ram blocks and >>>>>> does >>>>>> not expect this to change at random points in time. >>>>>> >>>>>> Precopy: The ram block size must not change on the source, after >>>>>> ram_save_setup(), so as long as the guest is still running on the source. >>>>>> >>>>>> Postcopy: The ram block size must not change on the target, after >>>>>> synchronizing the RAM block list (ram_load_precopy()). >>>>>> >>>>>> AFAIKS, resizing can be trigger *after* (but not during) a reset in >>>>>> ACPI code by the guest >>>>>> - hw/arm/virt-acpi-build.c:acpi_ram_update() >>>>>> - hw/i386/acpi-build.c:acpi_ram_update() >>>>>> >>>>>> I see no easy way to work around this. Fail hard instead of failing >>>>>> somewhere in migration code due to strange other reasons. AFAIKs, the >>>>>> rebuilts will be triggered during reboot, so this should not affect >>>>>> running guests, but only guests that reboot at a very bad time and >>>>>> actually require size changes. >>>>>> >>>>>> Let's further limit the impact by checking if an actual resize of the >>>>>> RAM (in number of pages) is required. >>>>>> >>>>>> Don't perform the checks in qemu_ram_resize(), as that's called during >>>>>> migration when syncing the used_length. Update documentation. >>>>> >>>>> Interesting; we need to do something about this - but banning resets >>>>> during migration is a bit harsh; and aborting the source VM is really >>>>> nasty - for a precopy especially we shouldn't kill the source VM, >>>>> we should just abort the migration. >>>> >>>> Any alternative, easy solutions to handle this? I do wonder how often >>>> this will actually trigger in real life. >>> >>> Well it's not that hard to abort a migration (I'm not sure we've got a >>> convenient wrapper to do it - but it shouldn't be hard to add). >>> >> >> We do have qmp_migrate_cancel(). I hope that can be called under BQL. > > Well it's a monitor command so I think so; although it's not really > designed for an error - it's a user action. Doing a > migrate_set_error(..) followed by a qemu_file_shutdown is probably a > good bet.
I'll base on "[PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX", where I extend the ram block notifier with a resize notification. migrate/ram.c can register the notifier and react accordingly. E.g., for precopy, abort migration. Not sure about postcopy (below). > >> Can that be called in both, precopy and postcopy case? I assume in the >> precopy, it's easy. > > The problem is during postcopy you're toast when that happens because > you can't restart; however, can this happen once we're actually in > postcopy? It's a little different - if it happens before the transition > to postcopy then it's the same as precopy; if it happens afterwards.. > well it's going to happen ont he destination side and that's quite > different. If it happens after, we are in trouble at least with received bitmaps. Not sure about other issues (it's a lot of code :) ). Especially shrinking while trying to place pages will be bad and fail. It's code that assumes used_length won't change. ramblock_recv_bitmap_send() on the target and ram_dirty_bitmap_reload() on the source. ram_dirty_bitmap_reload() will bail out if the sizes don't match. -- Thanks, David / dhildenb
