Peter Xu <pet...@redhat.com> wrote: > On Thu, Oct 19, 2023 at 08:28:05PM +0200, Juan Quintela wrote: >> Peter Xu <pet...@redhat.com> wrote: >> > On Thu, Oct 19, 2023 at 05:00:02PM +0200, Juan Quintela wrote: >> >> Peter Xu <pet...@redhat.com> wrote: >> >> > Fabiano, >> >> > >> >> > Sorry to look at this series late; I messed up my inbox after I >> >> > reworked my >> >> > arrangement methodology of emails. ;) >> >> > >> >> > On Thu, Oct 19, 2023 at 11:06:06AM +0200, Juan Quintela wrote: >> >> >> Fabiano Rosas <faro...@suse.de> wrote: >> >> >> > The channels_ready semaphore is a global variable not linked to any >> >> >> > single multifd channel. Waiting on it only means that "some" channel >> >> >> > has become ready to send data. Since we need to address the channels >> >> >> > by index (multifd_send_state->params[i]), that information adds >> >> >> > nothing of value. >> >> >> And that is what we do here. >> >> We didn't had this last line (not needed for making sure the channels >> >> are ready here). >> >> >> >> But needed to make sure that we are maintaining channels_ready exact. >> > >> > I didn't expect it to be exact, I think that's the major part of confusion. >> > For example, I see this comment: >> > >> > static void *multifd_send_thread(void *opaque) >> > ... >> > } else { >> > qemu_mutex_unlock(&p->mutex); >> > /* sometimes there are spurious wakeups */ >> > } >> >> I put that there during development, and let it there just to be safe. >> Years later I put an assert() there and did lots of migrations, never >> hit it. >> >> > So do we have spurious wakeup anywhere for either p->sem or channels_ready? >> > They are related, because if we got spurious p->sem wakeups, then we'll >> > boost channels_ready one more time too there. >> >> I think that we can change that for g_assert_not_reached() > > Sounds good. We can also use an error_erport_once(), depending on your > confidence of that. :) Dropping that comment definitely helps. > > I had a quick look, indeed I think it's safe even with assert. We may want > to put some more comment on when one should kick p->sem; IIUC it can only > be kicked in either (1) pending_job increased, or (2) set exiting=1. Then > it seems all guaranteed.
I think we can change the end of the loop from: qemu_mutex_unlock(&p->mutex); if (flags & MULTIFD_FLAG_SYNC) { qemu_sem_post(&p->sem_sync); } } else { qemu_mutex_unlock(&p->mutex); /* sometimes there are spurious wakeups */ } to: if (flags & MULTIFD_FLAG_SYNC) { qemu_sem_post(&p->sem_sync); } } qemu_mutex_unlock(&p->mutex); And call it a day. But we can leave one assert there. But I would preffer to do this kind of locking changes at the beggining of next cycle. Later, Juan.