On Fri, Sep 12, 2025 at 11:36:51AM -0400, Peter Xu wrote:
> On Fri, Sep 12, 2025 at 12:27:52PM +0100, Daniel P. Berrangé wrote:
> > On Thu, Sep 11, 2025 at 05:23:54PM -0400, Peter Xu wrote:
> > > No issue I hit, the change is only from code observation when I am looking
> > > at a TLS premature termination issue.
> > > 
> > > qio_channel_tls_bye() API needs to be synchronous.  When it's not, the
> > > previous impl will attach an asynchronous task retrying but only until 
> > > when
> > > the channel gets the relevant GIO event. It may be problematic, because 
> > > the
> > > caller of qio_channel_tls_bye() may have invoked channel close() before
> > > that, leading to premature termination of the TLS session.
> > > 
> > > Remove the asynchronous handling, instead retry it immediately.  
> > > Currently,
> > > the only two possible cases that may lead to async task is either 
> > > INTERRUPT
> > > or EAGAIN.  It should be suffice to spin retry as of now, until a solid
> > > proof showing that a more complicated retry logic is needed.
> > > 
> > > With that, we can remove the whole async model for the bye task.
> > > 
> > > When at it, making the function return bool, which looks like a common
> > > pattern in QEMU when errp is used.
> > > 
> > > Side note on the tracepoints: previously the tracepoint bye_complete()
> > > isn't used.  Start to use it in this patch.  bye_pending() and 
> > > bye_cancel()
> > > can be dropped now.  Adding bye_retry() instead.
> > > 
> > > Signed-off-by: Peter Xu <[email protected]>
> > > ---
> > >  include/io/channel-tls.h |  5 ++-
> > >  io/channel-tls.c         | 86 +++++-----------------------------------
> > >  io/trace-events          |  3 +-
> > >  3 files changed, 15 insertions(+), 79 deletions(-)
> > > 

> > > diff --git a/io/channel-tls.c b/io/channel-tls.c
> > > index 5a2c8188ce..8510a187a8 100644
> > > --- a/io/channel-tls.c
> > > +++ b/io/channel-tls.c
> > > @@ -253,84 +253,25 @@ void qio_channel_tls_handshake(QIOChannelTLS *ioc,
> > >      qio_channel_tls_handshake_task(ioc, task, context);
> > >  }
> > >  
> > > -static gboolean qio_channel_tls_bye_io(QIOChannel *ioc, GIOCondition 
> > > condition,
> > > -                                       gpointer user_data);
> > > -
> > > -static void qio_channel_tls_bye_task(QIOChannelTLS *ioc, QIOTask *task,
> > > -                                     GMainContext *context)
> > > +bool qio_channel_tls_bye(QIOChannelTLS *ioc, Error **errp)
> > >  {
> > > -    GIOCondition condition;
> > > -    QIOChannelTLSData *data;
> > >      int status;
> > > -    Error *err = NULL;
> > >  
> > > -    status = qcrypto_tls_session_bye(ioc->session, &err);
> > > +    trace_qio_channel_tls_bye_start(ioc);
> > > +retry:
> > > +    status = qcrypto_tls_session_bye(ioc->session, errp);
> > >  
> > >      if (status < 0) {
> > >          trace_qio_channel_tls_bye_fail(ioc);
> > 
> > snip
> > 
> > > +        return false;
> > > +    } else if (status != QCRYPTO_TLS_BYE_COMPLETE) {
> > > +        /* BYE event must be synchronous, retry immediately */
> > > +        trace_qio_channel_tls_bye_retry(ioc, status);
> > > +        goto retry;
> > >      }
> > 
> > We cannot do this. If the gnutls_bye() API needs to perform
> > socket I/O, and so when we're running over a non-blocking
> > socket we must expect EAGAIN. With this handling, QEMU will
> > busy loop burning 100% CPU when the socket is not ready.
> 
> Right.  That was the plan when drafting this, the hope is spinning will
> almost not happen, and even if it happens it'll finish soon (migration is
> completing, it means network mustn't be so bad), unless the network is
> stuck exactly at when we send the bye().

Don't forget that the machine can be running 5 migrations
of different VMs concurrently, and so may not be as quick
to finish sending traffic as we expect. Since QEMU's mig
protocol is essentially undirectional, I wonder if the
send buffer could still be full of VMstate data waiting
to be sent ? Perhaps its fine, but I don't like relying
on luck, or hard-to-prove scenarios.

> > A second point is that from a QIOChannel POV, we need to
> > ensure that all APIs can be used in a non-blocking scenario.
> > This is why in the QIOChannelSocket impl connect/listen APIs
> > we provide both _sync and _async variants of the APIs, or
> > in the QIOChannelTLS impl, the handshake API is always
> > async with a callback to be invokved on completion.
> 
> I agree.  The issue is if so, migration code needs to be always be prepared
> with a possible async op even if in 99.9999% cases it won't happen... we
> need to complicate the multifd logic a lot for this, but the gain is
> little..
> 
> This series still used patch 1 to fix the problem (rather than do real BYE
> on preempt channels, for example) only because it's the easiest, after all
> it's still a contract in tls channel impl to allow premature termination
> for explicit shutdown()s on the host.
> 
> If we want to do 100% graceful shutdowns, we'll need to apply this to all
> channels, and the async-possible model can definitely add more complexity
> more than multifd.  I hope it won't be necessary.. but just to mention it.

Even if the migration code is relying on non-blocking sockets
for most of its work, at the time we're ready to invoke "bye",
perhaps the migration code could simply call

 qio_channel_set_blocking(ioc, true)

to switch the socket over to blocking mode.


> > The QIOChanel 'bye' method is flawed in that it is
> > asynchronous, but has no callback for completion.
> > 
> > If migration is /always/ using a blocking socket for the
> > TLS channels this isn't a problem as gnutls will complete
> > immediately, but if any async sockets are used we have no
> > way to wait for completion. This requires improving the
> > API design in some manner.
> 
> I recall one of your future series on TLS would start to enable async for
> all channels?  In all cases, we definitely don't want to have this call to
> be relevant to the blocking mode of the channels.
> 
> Would it make sense to introduce a _sync() version of it, but keep the
> original bye(), leaving the rest until a real async user appears?
> 
> I can also at least drop this patch as of now, because we can still wish it
> almost always be synchronous.  However we have risk forgetting that forever
> and hit it a few years later..

If we leave the current code as-is, and relying on migration switching
to blocking mode first before calling bye, we'll be ok

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to