On Thu, Sep 14, 2023 at 08:20:53PM +0000, “William Roche wrote: > From: William Roche <[email protected]> > > A Qemu VM can survive a memory error, as qemu can relay the error to the > VM kernel which could also deal with it -- poisoning/off-lining the impacted > page. > This situation creates a hole in the VM memory address space that the VM > kernel > knows about (an unreadable page or set of pages). > > But the migration of this VM (live migration through the network or > pseudo-migration with the creation of a state file) will crash Qemu when > it sequentially reads the memory address space and stumbles on the > existing hole. > > In order to correct this problem, I suggest to treat the poisoned pages as if > they were zero-pages for the migration copy. > This fix also works with underlying large pages, taking into account the > RAMBlock segment "page-size". > This fix is scripts/checkpatch.pl clean. > > v2: > - adding compressed transfer handling of poisoned pages > > Testing: I could verify that migration now works with a poisoned page > through standard and compressed migration with 4k and large (2M) pages. > > The RDMA transfer is not considered by this patch. > > William Roche (1): > migration: skip poisoned memory pages on "ram saving" phase
If there's a new version, please consider adding a TODO above control_save_page() that poison page is probably broken there, so we can still remember. Reviewed-by: Peter Xu <[email protected]> Copy: [email protected], [email protected] Thanks, -- Peter Xu
