On Thu, Sep 14, 2023 at 10:23:38AM -0300, Fabiano Rosas wrote: > Peter Xu <[email protected]> writes: > > > On Wed, Sep 13, 2023 at 06:53:20PM -0300, Fabiano Rosas wrote: > >> Peter Xu <[email protected]> writes: > >> > >> > On Mon, Sep 11, 2023 at 02:13:19PM -0300, Fabiano Rosas wrote: > >> >> The core yank code is strict about balanced registering and > >> >> unregistering of yank functions. > >> >> > >> >> This creates a difficulty because the migration code registers one > >> >> yank function per QIOChannel, but each QIOChannel can be referenced by > >> >> more than one QEMUFile. The yank function should not be removed until > >> >> all QEMUFiles have been closed. > >> >> > >> >> Keep a reference count of how many QEMUFiles are using a QIOChannel > >> >> that has a yank function. Only unregister the yank function when all > >> >> QEMUFiles have been closed. > >> >> > >> >> This improves the current code by removing the need for the programmer > >> >> to know which QEMUFile is the last one to be cleaned up and fixes the > >> >> theoretical issue of removing the yank function while another QEMUFile > >> >> could still be using the ioc and require a yank. > >> >> > >> >> Signed-off-by: Fabiano Rosas <[email protected]> > >> >> --- > >> >> migration/yank_functions.c | 81 ++++++++++++++++++++++++++++++++++---- > >> >> migration/yank_functions.h | 8 ++++ > >> >> 2 files changed, 81 insertions(+), 8 deletions(-) > >> > > >> > I worry this over-complicate things. > >> > >> It does. We ran out of simple options. > >> > >> > If you prefer the cleaness that we operate always on qemufile level, can > >> > we > >> > just register each yank function per-qemufile? > >> > >> "just" hehe > >> > >> we could, but: > >> > >> i) the yank is a per-channel operation, so this is even more unintuitive; > > > > I mean we can provide something like: > > > > void migration_yank_qemufile(void *opaque) > > { > > QEMUFile *file = opaque; > > QIOChannel *ioc = file->ioc; > > > > qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL); > > } > > > > void migration_qemufile_register_yank(QEMUFile *file) > > { > > if (migration_ioc_yank_supported(file->ioc)) { > > yank_register_function(MIGRATION_YANK_INSTANCE, > > migration_yank_qemufile, > > file); > > } > > } > > Sure, this is what I was thinking as well. IMO it will be yet another > operation that happens on the channel, but it performed via the > file. Just like qio_channel_close() at qemu_fclose(). Not the end of the > world, of course, I just find it error-prone. > > >> > >> ii) multifd doesn't have a QEMUFile, so it will have to continue using > >> the ioc; > > > > We can keep using migration_ioc_[un]register_yank() for them if there's no > > qemufile attached. As long as the function will all be registered under > > MIGRATION_YANK_INSTANCE we should be fine having different yank func. > > > > ok > > >> > >> iii) we'll have to add a yank to every new QEMUFile created during the > >> incoming migration (colo, rdma, etc), otherwise the incoming side > >> will be left using iocs while the src uses the QEMUFile; > > > > For RDMA, IIUC it'll simply be a noop as migration_ioc_yank_supported() > > will be a noop for it for either reg/unreg. > > > > Currently it seems we will also unreg the ioc even for RDMA (even though we > > don't reg for it). But since unreg will be a noop it seems all fine even > > if not paired.. maybe we should still try to pair it, e.g. register also in > > rdma_start_outgoing_migration() for the rdma ioc so at least they're paired. > > > > I don't see why COLO is special here, though. Maybe I missed something. > > For colo I was thinking we'd have to register the yank just to be sure > that all paths unregistering it have something to unregister. > > Maybe I should move the register into qemu_file_new_impl() with a > matching unregister at qemu_fclose().
Sounds good. Or... > > >> > >> iv) this is a functional change of the yank feature for which we have no > >> tests. > > > > Having yank tested should be preferrable. Lukas is in the loop, let's see > > whether he has something. We can still smoke test it before a selftest > > being there. > > > > Taking one step back.. I doubt whether anyone is using yank for migration? > > Knowing that migration already have migrate-cancel (for precopy) and > > migrate-pause (for postcopy). > > Right, both already call qio_channel_shutdown(). > > > I never used it myself, and I don't think > > it's supported for RHEL. How's that in suse's case? > > Never heard mention of it and I don't see it in our virtualization > documentation. > > > > > If no one is using it, maybe we can even avoid registering migration to > > yank? > > > > Seems reasonable to me. ... let's wait for a few days from Lukas to see whether he as any more input, or I'd vote for dropping yank for migration as a whole. It caused mostly more crashes that I knew than benefits, so far.. I also checked libvirt is not using yank. > > >> > >> If that's all ok to you I'll go ahead and git it a try. > >> > >> > I think qmp yank will simply fail the 2nd call on the qemufile if the > >> > iochannel is shared with the other one, but that's totally fine, IMHO. > >> > > >> > What do you think? > >> > > >> > In all cases, we should probably at least merge patch 1-8 if that can > >> > resolve the CI issue. I think all of them are properly reviewed. > >> > >> I agree. Someone needs to queue this though since Juan has been busy. > > > > Yes, I'll see what I can do. > > Thanks. I could even send a pull request myself if it would make things > easier. Let me know. That's definitely an option. But I want to make sure it's the same thing; I replied in Stefan's report. We can continue the discussion there for that. -- Peter Xu
