On Mon, Sep 15, 2025 at 07:40:33PM +0100, Daniel P. Berrangé wrote:
> On Fri, Sep 12, 2025 at 11:36:51AM -0400, Peter Xu wrote:
> > On Fri, Sep 12, 2025 at 12:27:52PM +0100, Daniel P. Berrangé wrote:
> > > On Thu, Sep 11, 2025 at 05:23:54PM -0400, Peter Xu wrote:
> > > > No issue I hit, the change is only from code observation when I am 
> > > > looking
> > > > at a TLS premature termination issue.
> > > > 
> > > > qio_channel_tls_bye() API needs to be synchronous.  When it's not, the
> > > > previous impl will attach an asynchronous task retrying but only until 
> > > > when
> > > > the channel gets the relevant GIO event. It may be problematic, because 
> > > > the
> > > > caller of qio_channel_tls_bye() may have invoked channel close() before
> > > > that, leading to premature termination of the TLS session.
> > > > 
> > > > Remove the asynchronous handling, instead retry it immediately.  
> > > > Currently,
> > > > the only two possible cases that may lead to async task is either 
> > > > INTERRUPT
> > > > or EAGAIN.  It should be suffice to spin retry as of now, until a solid
> > > > proof showing that a more complicated retry logic is needed.
> > > > 
> > > > With that, we can remove the whole async model for the bye task.
> > > > 
> > > > When at it, making the function return bool, which looks like a common
> > > > pattern in QEMU when errp is used.
> > > > 
> > > > Side note on the tracepoints: previously the tracepoint bye_complete()
> > > > isn't used.  Start to use it in this patch.  bye_pending() and 
> > > > bye_cancel()
> > > > can be dropped now.  Adding bye_retry() instead.
> > > > 
> > > > Signed-off-by: Peter Xu <[email protected]>
> > > > ---
> > > >  include/io/channel-tls.h |  5 ++-
> > > >  io/channel-tls.c         | 86 +++++-----------------------------------
> > > >  io/trace-events          |  3 +-
> > > >  3 files changed, 15 insertions(+), 79 deletions(-)
> > > > 
> 
> > > > diff --git a/io/channel-tls.c b/io/channel-tls.c
> > > > index 5a2c8188ce..8510a187a8 100644
> > > > --- a/io/channel-tls.c
> > > > +++ b/io/channel-tls.c
> > > > @@ -253,84 +253,25 @@ void qio_channel_tls_handshake(QIOChannelTLS *ioc,
> > > >      qio_channel_tls_handshake_task(ioc, task, context);
> > > >  }
> > > >  
> > > > -static gboolean qio_channel_tls_bye_io(QIOChannel *ioc, GIOCondition 
> > > > condition,
> > > > -                                       gpointer user_data);
> > > > -
> > > > -static void qio_channel_tls_bye_task(QIOChannelTLS *ioc, QIOTask *task,
> > > > -                                     GMainContext *context)
> > > > +bool qio_channel_tls_bye(QIOChannelTLS *ioc, Error **errp)
> > > >  {
> > > > -    GIOCondition condition;
> > > > -    QIOChannelTLSData *data;
> > > >      int status;
> > > > -    Error *err = NULL;
> > > >  
> > > > -    status = qcrypto_tls_session_bye(ioc->session, &err);
> > > > +    trace_qio_channel_tls_bye_start(ioc);
> > > > +retry:
> > > > +    status = qcrypto_tls_session_bye(ioc->session, errp);
> > > >  
> > > >      if (status < 0) {
> > > >          trace_qio_channel_tls_bye_fail(ioc);
> > > 
> > > snip
> > > 
> > > > +        return false;
> > > > +    } else if (status != QCRYPTO_TLS_BYE_COMPLETE) {
> > > > +        /* BYE event must be synchronous, retry immediately */
> > > > +        trace_qio_channel_tls_bye_retry(ioc, status);
> > > > +        goto retry;
> > > >      }
> > > 
> > > We cannot do this. If the gnutls_bye() API needs to perform
> > > socket I/O, and so when we're running over a non-blocking
> > > socket we must expect EAGAIN. With this handling, QEMU will
> > > busy loop burning 100% CPU when the socket is not ready.
> > 
> > Right.  That was the plan when drafting this, the hope is spinning will
> > almost not happen, and even if it happens it'll finish soon (migration is
> > completing, it means network mustn't be so bad), unless the network is
> > stuck exactly at when we send the bye().
> 
> Don't forget that the machine can be running 5 migrations
> of different VMs concurrently, and so may not be as quick
> to finish sending traffic as we expect. Since QEMU's mig
> protocol is essentially undirectional, I wonder if the
> send buffer could still be full of VMstate data waiting
> to be sent ? Perhaps its fine, but I don't like relying
> on luck, or hard-to-prove scenarios.

Very rare conditional spinning is, IMHO, totally OK.  I wished all our
problems are as simple as cpu spinning.. then it's super straightforward to
debug when hit, and also benign.  We can add whatever smart tech after that.

I would just guess even if with this patch goes in, we will never observe
it considering the write buffer shouldn't be more than tens of MBs..

Said that..

> 
> > > A second point is that from a QIOChannel POV, we need to
> > > ensure that all APIs can be used in a non-blocking scenario.
> > > This is why in the QIOChannelSocket impl connect/listen APIs
> > > we provide both _sync and _async variants of the APIs, or
> > > in the QIOChannelTLS impl, the handshake API is always
> > > async with a callback to be invokved on completion.
> > 
> > I agree.  The issue is if so, migration code needs to be always be prepared
> > with a possible async op even if in 99.9999% cases it won't happen... we
> > need to complicate the multifd logic a lot for this, but the gain is
> > little..
> > 
> > This series still used patch 1 to fix the problem (rather than do real BYE
> > on preempt channels, for example) only because it's the easiest, after all
> > it's still a contract in tls channel impl to allow premature termination
> > for explicit shutdown()s on the host.
> > 
> > If we want to do 100% graceful shutdowns, we'll need to apply this to all
> > channels, and the async-possible model can definitely add more complexity
> > more than multifd.  I hope it won't be necessary.. but just to mention it.
> 
> Even if the migration code is relying on non-blocking sockets
> for most of its work, at the time we're ready to invoke "bye",
> perhaps the migration code could simply call
> 
>  qio_channel_set_blocking(ioc, true)
> 
> to switch the socket over to blocking mode.

... I think this is a good idea and should solve the problem indeed.  I
hope there's no loophole that it could still trigger the async path.

I'll respin with that, thanks!

-- 
Peter Xu


Reply via email to