On Wed, 28 Jan 2026 09:30:24 -0300 Fabiano Rosas <[email protected]> wrote:
> Peter Xu <[email protected]> writes: > > > On Mon, Jan 26, 2026 at 06:37:31PM -0300, Fabiano Rosas wrote: > >> Lukas Straub <[email protected]> writes: > >> > >> >> [...] > >> >> > >> >> > + for (int i = 0; i < p->zero_num; i++) { > >> >> > + void *guest = p->block->host + p->zero[i]; > >> >> > + memset(guest, 0, multifd_ram_page_size()); > >> >> > + } > >> >> > >> >> At multifd_nocomp_recv, there will be a call to > >> >> multifd_recv_zero_page_process(), which by that point will have p->host > >> >> == p->block->colo_cache, so it looks like that function will do some > >> >> zero page processing in the colo_cache, setting the rb->receivedmap for > >> >> pages in the colo_cache and potentially also doing a memcpy. Is this > >> >> intended? > >> > > >> > rb->receivedmap is only for postcopy, right? So it doesn't apply with > >> > colo. > >> > > >> > >> It's not anymore since commit 5ef7e26bdb ("migration/multifd: solve zero > >> page causing multiple page faults"). So it seems we might be doing extra > >> work on top of the colo_cache. > > > > IIUC not extra, but exactly what will be needed. > > > > The logic was about "in a vanilla precopy, if we see one page arriving the > > 1st time we don't need to zero the buffer because the buffer should be zero > > allocated". > > > > In COLO's case, COLO always puts RAM data into colo_cache, hence it should > > apply to colo_cache too, avoiding unnecessary memset() for colo_cache > > instead. > > > > E.g. colo_cache is allocated from qemu_anon_ram_alloc(), it's also > > guaranteed to be zeros when never touched. > > > >> > >> >> > >> >> I'm thinking that maybe it would overall be better to hook colo directly > >> >> in to multifd_nocomp_recv: > >> > > >> > But then it will only work for nocomp, right? It feels like the wrong > >> > level of abstraction to me. > >> > > >> > >> Ah, nocomp != ram indeed. > >> > >> >> > >> >> > [...] > >> >> > diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c > >> >> > index > >> >> > 9be79b3b8e00371ebff9e112766c225bec260bf7..9f7a792fa761b3bc30b971b35f464103a61787f0 > >> >> > 100644 > >> >> > --- a/migration/multifd-nocomp.c > >> >> > +++ b/migration/multifd-nocomp.c > >> >> > @@ -16,6 +16,7 @@ > >> >> > #include "file.h" > >> >> > #include "migration-stats.h" > >> >> > #include "multifd.h" > >> >> > +#include "multifd-colo.h" > >> >> > #include "options.h" > >> >> > #include "migration.h" > >> >> > #include "qapi/error.h" > >> >> > @@ -269,7 +270,6 @@ int multifd_ram_unfill_packet(MultiFDRecvParams > >> >> > *p, Error **errp) > >> >> > return -1; > >> >> > } > >> >> > > >> >> > - p->host = p->block->host; > >> >> > for (i = 0; i < p->normal_num; i++) { > >> >> > uint64_t offset = be64_to_cpu(packet->offset[i]); > >> >> > > >> >> > @@ -294,6 +294,14 @@ int multifd_ram_unfill_packet(MultiFDRecvParams > >> >> > *p, Error **errp) > >> >> > p->zero[i] = offset; > >> >> > } > >> >> > > >> >> > + if (migrate_colo()) { > >> >> > + multifd_colo_prepare_recv(p); > >> >> > + assert(p->block->colo_cache); > >> >> > + p->host = p->block->colo_cache; > >> >> > >> >> Can't you just use p->block->colo_cache later? I don't see why p->host > >> >> needs to be set beforehand even in the non-colo case. > >> > > >> > We should not touch the guest ram directly while in colo state, since > >> > the incoming guest is running and we either want to receive and apply a > >> > whole checkpoint with all ram into colo cache and all device state, > >> > or if anything goes wrong during checkpointing, keep the currently > >> > running guest on the incoming side in pristine state. > >> > > >> > >> I was asking about setting p->host at this specific point. I don't think > >> any of this fits the unfill function. However, I see those were > >> suggested by Peter so let's not go back and forth. > > > > Actually I don't know why p->host existed before this work; IIUC we could > > have always used p->block->host. Maybe when Juan was developing this Juan > > kept COLO in mind; or maybe Juan wanted to avoid frequent p->block pointer > > reference. > > > > Maybe p->block was being reset at some point and p->host was passed > being the point where the (whatever) lock was release. I checked and > today there's no such thing. The p->mutex seems to be there just to > protect against this in multifd_recv_sync_main: > > WITH_QEMU_LOCK_GUARD(&p->mutex) { > if (multifd_recv_state->packet_num < p->packet_num) { > multifd_recv_state->packet_num = p->packet_num; > } > } > > > IIUC, we could remove p->host, but when we need to access "the buffer of > > the ramblock" we'll need to call a helper to fetch that (either ramblock's > > buffer, or colo_cache, per migrate_colo()). And it might be slightly > > slower than p->host indeed. > > > > Yeah, let's keep it, the compression code also uses it, there's no point > removing it now. > Actually p->host was there first p->block was added later for COLO in 5d1d1fcf4 multifd: Add the ramblock to MultiFDRecvParams
pgpFFv_XrbCHf.pgp
Description: OpenPGP digital signature
