On Mon, Nov 06, 2023 at 10:38:14PM +0100, William Roche wrote: > Note also that large pages are taken into account too for our live > migration, but the poisoning of a qemu large page requires more work > especially for VM using standard 4k pages on top of these qemu large > pages -- and this is a completely different issue. I'm mentioning this > aspect here because even on Intel platforms, underlying large pages > poisoning needs to be reported better to the running VM as a large > section of its memory is gone (not just a single head 4k page), and > adding live migration to this problem will not make things any better...
Good point.. Yes, huge poisoned pages seem all broken. > I did that in a self content test program: memory allocation, > UFFDIO_REGISTER and use of UFFDIO_POISON. The register mode has to be > given but MISSING or WP both works. This gives the possibility to inject > poison in a much easier and better way than using > madvise(... MADV_HWPOISON, ...) for example. Indeed, I should have left a comment if I noticed that when reviewing the POISON changes; I overlooked that find_dst_vma(), even named like that, will check the vma uffd context existed. Doesn't really be necessary to UFFDIO_POISON. I can consider proposing a patch to allow that, which should be trivial.. but it won't help with old kernels, so QEMU may still need to better always register to make it always work as long as UFFD_FEATURE_POISON reported.. sad. > > But it implies a lot of other changes: > - The source has to flag the error pages to indicate a poison > (new flag in the exchange protocole) > - The destination has to be able to deal with the new protocole IIUC these two can be simply implemented by migrating hwpoison_page_list over to dest. You need to have a compat bit for doing this, ignoring the list on old machine types, because old QEMUs will not recognize this vmsd. QEMU should even support migrating a list object in VMSD, feel free to have a look at VMSTATE_QLIST_V(). > - The destination has to be able to mark the pages as poisoned > (authorized to use userfaultfd) Note: userfaultfd is actually available without any privilege if to use UFFDIO_POISON only, as long as to open the uffd (either via syscall or /dev/userfaultfd) using UFFD_FLAG_USER_ONLY. A trick is we can register with UFFD_WP mode (not MISSING; because when a kernel accesses a missing page it'll cause SIGBUS then with USER_ONLY), then inject whatever POISON we want. As long as UFFDIO_WRITEPROTECT is not invoked, UFFD_WP does nothing (unlike MISSING). > - So both source and destination have to be upgraded (of course > qemu but also an appropriate kernel version providing > UFFDIO_POISON on the destination) True. Unfortunately this is not avoidable. > - we may need to be able to negotiate a fall back solution > - an indication of the method to use could belong to the > migration capabilities and parameters For above two points: it's a common issue with migration compatibility. As long as you can provide above VMSD to migrate hwpoison_page_list, marking all old QEMU machine types skipping that, then it should just work. You can have a closer look at anything in hw_compat_* as an example. > - etc... I think you did summarize mostly all the points I can think of; is there really anything more? :) It'll be great if you can, or plan to, fix that for good. Thanks, -- Peter Xu
