multifd: Remove channels_ready semaphore

Juan Quintela Fri, 20 Oct 2023 00:57:28 -0700

Peter Xu <pet...@redhat.com> wrote:
> On Thu, Oct 19, 2023 at 08:28:05PM +0200, Juan Quintela wrote:
>> Peter Xu <pet...@redhat.com> wrote:
>> > On Thu, Oct 19, 2023 at 05:00:02PM +0200, Juan Quintela wrote:
>> >> Peter Xu <pet...@redhat.com> wrote:
>> >> > Fabiano,
>> >> >
>> >> > Sorry to look at this series late; I messed up my inbox after I 
>> >> > reworked my
>> >> > arrangement methodology of emails. ;)
>> >> >
>> >> > On Thu, Oct 19, 2023 at 11:06:06AM +0200, Juan Quintela wrote:
>> >> >> Fabiano Rosas <faro...@suse.de> wrote:
>> >> >> > The channels_ready semaphore is a global variable not linked to any
>> >> >> > single multifd channel. Waiting on it only means that "some" channel
>> >> >> > has become ready to send data. Since we need to address the channels
>> >> >> > by index (multifd_send_state->params[i]), that information adds
>> >> >> > nothing of value.
>> 
>> >> And that is what we do here.
>> >> We didn't had this last line (not needed for making sure the channels
>> >> are ready here).
>> >> 
>> >> But needed to make sure that we are maintaining channels_ready exact.
>> >
>> > I didn't expect it to be exact, I think that's the major part of confusion.
>> > For example, I see this comment:
>> >
>> > static void *multifd_send_thread(void *opaque)
>> >        ...
>> >         } else {
>> >             qemu_mutex_unlock(&p->mutex);
>> >             /* sometimes there are spurious wakeups */
>> >         }
>> 
>> I put that there during development, and let it there just to be safe.
>> Years later I put an assert() there and did lots of migrations, never
>> hit it.
>> 
>> > So do we have spurious wakeup anywhere for either p->sem or channels_ready?
>> > They are related, because if we got spurious p->sem wakeups, then we'll
>> > boost channels_ready one more time too there.
>> 
>> I think that we can change that for g_assert_not_reached()
>
> Sounds good.  We can also use an error_erport_once(), depending on your
> confidence of that. :)  Dropping that comment definitely helps.
>
> I had a quick look, indeed I think it's safe even with assert.  We may want
> to put some more comment on when one should kick p->sem; IIUC it can only
> be kicked in either (1) pending_job increased, or (2) set exiting=1.  Then
> it seems all guaranteed.


I think we can change the end of the loop from:

            qemu_mutex_unlock(&p->mutex);

            if (flags & MULTIFD_FLAG_SYNC) {
                qemu_sem_post(&p->sem_sync);
            }
        } else {
            qemu_mutex_unlock(&p->mutex);
            /* sometimes there are spurious wakeups */
        }

to:

            if (flags & MULTIFD_FLAG_SYNC) {
                qemu_sem_post(&p->sem_sync);
            }
        }
        qemu_mutex_unlock(&p->mutex);


And call it a day.  But we can leave one assert there.

But I would preffer to do this kind of locking changes at the beggining
of next cycle.

Later, Juan.

Re: [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore

Reply via email to