On Wed, 28 Jan 2026 09:30:24 -0300
Fabiano Rosas <[email protected]> wrote:

> Peter Xu <[email protected]> writes:
> 
> > On Mon, Jan 26, 2026 at 06:37:31PM -0300, Fabiano Rosas wrote:  
> >> Lukas Straub <[email protected]> writes:
> >>   
> >> >> [...]
> >> >>   
> >> >> > +        for (int i = 0; i < p->zero_num; i++) {
> >> >> > +            void *guest = p->block->host + p->zero[i];
> >> >> > +            memset(guest, 0, multifd_ram_page_size());
> >> >> > +        }    
> >> >> 
> >> >> At multifd_nocomp_recv, there will be a call to
> >> >> multifd_recv_zero_page_process(), which by that point will have p->host
> >> >> == p->block->colo_cache, so it looks like that function will do some
> >> >> zero page processing in the colo_cache, setting the rb->receivedmap for
> >> >> pages in the colo_cache and potentially also doing a memcpy. Is this
> >> >> intended?  
> >> >
> >> > rb->receivedmap is only for postcopy, right? So it doesn't apply with
> >> > colo.
> >> >  
> >> 
> >> It's not anymore since commit 5ef7e26bdb ("migration/multifd: solve zero
> >> page causing multiple page faults"). So it seems we might be doing extra
> >> work on top of the colo_cache.  
> >
> > IIUC not extra, but exactly what will be needed.
> >
> > The logic was about "in a vanilla precopy, if we see one page arriving the
> > 1st time we don't need to zero the buffer because the buffer should be zero
> > allocated".
> >
> > In COLO's case, COLO always puts RAM data into colo_cache, hence it should
> > apply to colo_cache too, avoiding unnecessary memset() for colo_cache
> > instead.
> >
> > E.g. colo_cache is allocated from qemu_anon_ram_alloc(), it's also
> > guaranteed to be zeros when never touched.
> >  
> >>   
> >> >> 
> >> >> I'm thinking that maybe it would overall be better to hook colo directly
> >> >> in to multifd_nocomp_recv:  
> >> >
> >> > But then it will only work for nocomp, right? It feels like the wrong
> >> > level of abstraction to me.
> >> >  
> >> 
> >> Ah, nocomp != ram indeed.
> >>   
> >> >> 
> >> >> > [...]
> >> >> > diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
> >> >> > index 
> >> >> > 9be79b3b8e00371ebff9e112766c225bec260bf7..9f7a792fa761b3bc30b971b35f464103a61787f0
> >> >> >  100644
> >> >> > --- a/migration/multifd-nocomp.c
> >> >> > +++ b/migration/multifd-nocomp.c
> >> >> > @@ -16,6 +16,7 @@
> >> >> >  #include "file.h"
> >> >> >  #include "migration-stats.h"
> >> >> >  #include "multifd.h"
> >> >> > +#include "multifd-colo.h"
> >> >> >  #include "options.h"
> >> >> >  #include "migration.h"
> >> >> >  #include "qapi/error.h"
> >> >> > @@ -269,7 +270,6 @@ int multifd_ram_unfill_packet(MultiFDRecvParams 
> >> >> > *p, Error **errp)
> >> >> >          return -1;
> >> >> >      }
> >> >> >  
> >> >> > -    p->host = p->block->host;
> >> >> >      for (i = 0; i < p->normal_num; i++) {
> >> >> >          uint64_t offset = be64_to_cpu(packet->offset[i]);
> >> >> >  
> >> >> > @@ -294,6 +294,14 @@ int multifd_ram_unfill_packet(MultiFDRecvParams 
> >> >> > *p, Error **errp)
> >> >> >          p->zero[i] = offset;
> >> >> >      }
> >> >> >  
> >> >> > +    if (migrate_colo()) {
> >> >> > +        multifd_colo_prepare_recv(p);
> >> >> > +        assert(p->block->colo_cache);
> >> >> > +        p->host = p->block->colo_cache;    
> >> >> 
> >> >> Can't you just use p->block->colo_cache later? I don't see why p->host
> >> >> needs to be set beforehand even in the non-colo case.  
> >> >
> >> > We should not touch the guest ram directly while in colo state, since
> >> > the incoming guest is running and we either want to receive and apply a
> >> > whole checkpoint with all ram into colo cache and all device state,
> >> > or if anything goes wrong during checkpointing, keep the currently
> >> > running guest on the incoming side in pristine state.
> >> >  
> >> 
> >> I was asking about setting p->host at this specific point. I don't think
> >> any of this fits the unfill function. However, I see those were
> >> suggested by Peter so let's not go back and forth.  
> >
> > Actually I don't know why p->host existed before this work; IIUC we could
> > have always used p->block->host.  Maybe when Juan was developing this Juan
> > kept COLO in mind; or maybe Juan wanted to avoid frequent p->block pointer
> > reference.
> >  
> 
> Maybe p->block was being reset at some point and p->host was passed
> being the point where the (whatever) lock was release. I checked and
> today there's no such thing. The p->mutex seems to be there just to
> protect against this in multifd_recv_sync_main:
> 
> WITH_QEMU_LOCK_GUARD(&p->mutex) {
>     if (multifd_recv_state->packet_num < p->packet_num) {
>         multifd_recv_state->packet_num = p->packet_num;
>     }
> }
> 
> > IIUC, we could remove p->host, but when we need to access "the buffer of
> > the ramblock" we'll need to call a helper to fetch that (either ramblock's
> > buffer, or colo_cache, per migrate_colo()).  And it might be slightly
> > slower than p->host indeed.
> >  
> 
> Yeah, let's keep it, the compression code also uses it, there's no point
> removing it now.
> 

Actually p->host was there first p->block was added later for COLO in
5d1d1fcf4 multifd: Add the ramblock to MultiFDRecvParams

Attachment: pgpFFv_XrbCHf.pgp
Description: OpenPGP digital signature

Reply via email to