On Fri, Sep 12, 2025 at 11:36:51AM -0400, Peter Xu wrote:
> On Fri, Sep 12, 2025 at 12:27:52PM +0100, Daniel P. Berrangé wrote:
> > On Thu, Sep 11, 2025 at 05:23:54PM -0400, Peter Xu wrote:
> > > No issue I hit, the change is only from code observation when I am looking
> > > at a TLS premature termination issue.
> > >
> > > qio_channel_tls_bye() API needs to be synchronous. When it's not, the
> > > previous impl will attach an asynchronous task retrying but only until
> > > when
> > > the channel gets the relevant GIO event. It may be problematic, because
> > > the
> > > caller of qio_channel_tls_bye() may have invoked channel close() before
> > > that, leading to premature termination of the TLS session.
> > >
> > > Remove the asynchronous handling, instead retry it immediately.
> > > Currently,
> > > the only two possible cases that may lead to async task is either
> > > INTERRUPT
> > > or EAGAIN. It should be suffice to spin retry as of now, until a solid
> > > proof showing that a more complicated retry logic is needed.
> > >
> > > With that, we can remove the whole async model for the bye task.
> > >
> > > When at it, making the function return bool, which looks like a common
> > > pattern in QEMU when errp is used.
> > >
> > > Side note on the tracepoints: previously the tracepoint bye_complete()
> > > isn't used. Start to use it in this patch. bye_pending() and
> > > bye_cancel()
> > > can be dropped now. Adding bye_retry() instead.
> > >
> > > Signed-off-by: Peter Xu <[email protected]>
> > > ---
> > > include/io/channel-tls.h | 5 ++-
> > > io/channel-tls.c | 86 +++++-----------------------------------
> > > io/trace-events | 3 +-
> > > 3 files changed, 15 insertions(+), 79 deletions(-)
> > >
> > > diff --git a/io/channel-tls.c b/io/channel-tls.c
> > > index 5a2c8188ce..8510a187a8 100644
> > > --- a/io/channel-tls.c
> > > +++ b/io/channel-tls.c
> > > @@ -253,84 +253,25 @@ void qio_channel_tls_handshake(QIOChannelTLS *ioc,
> > > qio_channel_tls_handshake_task(ioc, task, context);
> > > }
> > >
> > > -static gboolean qio_channel_tls_bye_io(QIOChannel *ioc, GIOCondition
> > > condition,
> > > - gpointer user_data);
> > > -
> > > -static void qio_channel_tls_bye_task(QIOChannelTLS *ioc, QIOTask *task,
> > > - GMainContext *context)
> > > +bool qio_channel_tls_bye(QIOChannelTLS *ioc, Error **errp)
> > > {
> > > - GIOCondition condition;
> > > - QIOChannelTLSData *data;
> > > int status;
> > > - Error *err = NULL;
> > >
> > > - status = qcrypto_tls_session_bye(ioc->session, &err);
> > > + trace_qio_channel_tls_bye_start(ioc);
> > > +retry:
> > > + status = qcrypto_tls_session_bye(ioc->session, errp);
> > >
> > > if (status < 0) {
> > > trace_qio_channel_tls_bye_fail(ioc);
> >
> > snip
> >
> > > + return false;
> > > + } else if (status != QCRYPTO_TLS_BYE_COMPLETE) {
> > > + /* BYE event must be synchronous, retry immediately */
> > > + trace_qio_channel_tls_bye_retry(ioc, status);
> > > + goto retry;
> > > }
> >
> > We cannot do this. If the gnutls_bye() API needs to perform
> > socket I/O, and so when we're running over a non-blocking
> > socket we must expect EAGAIN. With this handling, QEMU will
> > busy loop burning 100% CPU when the socket is not ready.
>
> Right. That was the plan when drafting this, the hope is spinning will
> almost not happen, and even if it happens it'll finish soon (migration is
> completing, it means network mustn't be so bad), unless the network is
> stuck exactly at when we send the bye().
Don't forget that the machine can be running 5 migrations
of different VMs concurrently, and so may not be as quick
to finish sending traffic as we expect. Since QEMU's mig
protocol is essentially undirectional, I wonder if the
send buffer could still be full of VMstate data waiting
to be sent ? Perhaps its fine, but I don't like relying
on luck, or hard-to-prove scenarios.
> > A second point is that from a QIOChannel POV, we need to
> > ensure that all APIs can be used in a non-blocking scenario.
> > This is why in the QIOChannelSocket impl connect/listen APIs
> > we provide both _sync and _async variants of the APIs, or
> > in the QIOChannelTLS impl, the handshake API is always
> > async with a callback to be invokved on completion.
>
> I agree. The issue is if so, migration code needs to be always be prepared
> with a possible async op even if in 99.9999% cases it won't happen... we
> need to complicate the multifd logic a lot for this, but the gain is
> little..
>
> This series still used patch 1 to fix the problem (rather than do real BYE
> on preempt channels, for example) only because it's the easiest, after all
> it's still a contract in tls channel impl to allow premature termination
> for explicit shutdown()s on the host.
>
> If we want to do 100% graceful shutdowns, we'll need to apply this to all
> channels, and the async-possible model can definitely add more complexity
> more than multifd. I hope it won't be necessary.. but just to mention it.
Even if the migration code is relying on non-blocking sockets
for most of its work, at the time we're ready to invoke "bye",
perhaps the migration code could simply call
qio_channel_set_blocking(ioc, true)
to switch the socket over to blocking mode.
> > The QIOChanel 'bye' method is flawed in that it is
> > asynchronous, but has no callback for completion.
> >
> > If migration is /always/ using a blocking socket for the
> > TLS channels this isn't a problem as gnutls will complete
> > immediately, but if any async sockets are used we have no
> > way to wait for completion. This requires improving the
> > API design in some manner.
>
> I recall one of your future series on TLS would start to enable async for
> all channels? In all cases, we definitely don't want to have this call to
> be relevant to the blocking mode of the channels.
>
> Would it make sense to introduce a _sync() version of it, but keep the
> original bye(), leaving the rest until a real async user appears?
>
> I can also at least drop this patch as of now, because we can still wish it
> almost always be synchronous. However we have risk forgetting that forever
> and hit it a few years later..
If we leave the current code as-is, and relying on migration switching
to blocking mode first before calling bye, we'll be ok
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|