On Mon, Sep 15, 2025 at 07:40:33PM +0100, Daniel P. Berrangé wrote: > On Fri, Sep 12, 2025 at 11:36:51AM -0400, Peter Xu wrote: > > On Fri, Sep 12, 2025 at 12:27:52PM +0100, Daniel P. Berrangé wrote: > > > On Thu, Sep 11, 2025 at 05:23:54PM -0400, Peter Xu wrote: > > > > No issue I hit, the change is only from code observation when I am > > > > looking > > > > at a TLS premature termination issue. > > > > > > > > qio_channel_tls_bye() API needs to be synchronous. When it's not, the > > > > previous impl will attach an asynchronous task retrying but only until > > > > when > > > > the channel gets the relevant GIO event. It may be problematic, because > > > > the > > > > caller of qio_channel_tls_bye() may have invoked channel close() before > > > > that, leading to premature termination of the TLS session. > > > > > > > > Remove the asynchronous handling, instead retry it immediately. > > > > Currently, > > > > the only two possible cases that may lead to async task is either > > > > INTERRUPT > > > > or EAGAIN. It should be suffice to spin retry as of now, until a solid > > > > proof showing that a more complicated retry logic is needed. > > > > > > > > With that, we can remove the whole async model for the bye task. > > > > > > > > When at it, making the function return bool, which looks like a common > > > > pattern in QEMU when errp is used. > > > > > > > > Side note on the tracepoints: previously the tracepoint bye_complete() > > > > isn't used. Start to use it in this patch. bye_pending() and > > > > bye_cancel() > > > > can be dropped now. Adding bye_retry() instead. > > > > > > > > Signed-off-by: Peter Xu <[email protected]> > > > > --- > > > > include/io/channel-tls.h | 5 ++- > > > > io/channel-tls.c | 86 +++++----------------------------------- > > > > io/trace-events | 3 +- > > > > 3 files changed, 15 insertions(+), 79 deletions(-) > > > > > > > > > diff --git a/io/channel-tls.c b/io/channel-tls.c > > > > index 5a2c8188ce..8510a187a8 100644 > > > > --- a/io/channel-tls.c > > > > +++ b/io/channel-tls.c > > > > @@ -253,84 +253,25 @@ void qio_channel_tls_handshake(QIOChannelTLS *ioc, > > > > qio_channel_tls_handshake_task(ioc, task, context); > > > > } > > > > > > > > -static gboolean qio_channel_tls_bye_io(QIOChannel *ioc, GIOCondition > > > > condition, > > > > - gpointer user_data); > > > > - > > > > -static void qio_channel_tls_bye_task(QIOChannelTLS *ioc, QIOTask *task, > > > > - GMainContext *context) > > > > +bool qio_channel_tls_bye(QIOChannelTLS *ioc, Error **errp) > > > > { > > > > - GIOCondition condition; > > > > - QIOChannelTLSData *data; > > > > int status; > > > > - Error *err = NULL; > > > > > > > > - status = qcrypto_tls_session_bye(ioc->session, &err); > > > > + trace_qio_channel_tls_bye_start(ioc); > > > > +retry: > > > > + status = qcrypto_tls_session_bye(ioc->session, errp); > > > > > > > > if (status < 0) { > > > > trace_qio_channel_tls_bye_fail(ioc); > > > > > > snip > > > > > > > + return false; > > > > + } else if (status != QCRYPTO_TLS_BYE_COMPLETE) { > > > > + /* BYE event must be synchronous, retry immediately */ > > > > + trace_qio_channel_tls_bye_retry(ioc, status); > > > > + goto retry; > > > > } > > > > > > We cannot do this. If the gnutls_bye() API needs to perform > > > socket I/O, and so when we're running over a non-blocking > > > socket we must expect EAGAIN. With this handling, QEMU will > > > busy loop burning 100% CPU when the socket is not ready. > > > > Right. That was the plan when drafting this, the hope is spinning will > > almost not happen, and even if it happens it'll finish soon (migration is > > completing, it means network mustn't be so bad), unless the network is > > stuck exactly at when we send the bye(). > > Don't forget that the machine can be running 5 migrations > of different VMs concurrently, and so may not be as quick > to finish sending traffic as we expect. Since QEMU's mig > protocol is essentially undirectional, I wonder if the > send buffer could still be full of VMstate data waiting > to be sent ? Perhaps its fine, but I don't like relying > on luck, or hard-to-prove scenarios.
Very rare conditional spinning is, IMHO, totally OK. I wished all our problems are as simple as cpu spinning.. then it's super straightforward to debug when hit, and also benign. We can add whatever smart tech after that. I would just guess even if with this patch goes in, we will never observe it considering the write buffer shouldn't be more than tens of MBs.. Said that.. > > > > A second point is that from a QIOChannel POV, we need to > > > ensure that all APIs can be used in a non-blocking scenario. > > > This is why in the QIOChannelSocket impl connect/listen APIs > > > we provide both _sync and _async variants of the APIs, or > > > in the QIOChannelTLS impl, the handshake API is always > > > async with a callback to be invokved on completion. > > > > I agree. The issue is if so, migration code needs to be always be prepared > > with a possible async op even if in 99.9999% cases it won't happen... we > > need to complicate the multifd logic a lot for this, but the gain is > > little.. > > > > This series still used patch 1 to fix the problem (rather than do real BYE > > on preempt channels, for example) only because it's the easiest, after all > > it's still a contract in tls channel impl to allow premature termination > > for explicit shutdown()s on the host. > > > > If we want to do 100% graceful shutdowns, we'll need to apply this to all > > channels, and the async-possible model can definitely add more complexity > > more than multifd. I hope it won't be necessary.. but just to mention it. > > Even if the migration code is relying on non-blocking sockets > for most of its work, at the time we're ready to invoke "bye", > perhaps the migration code could simply call > > qio_channel_set_blocking(ioc, true) > > to switch the socket over to blocking mode. ... I think this is a good idea and should solve the problem indeed. I hope there's no loophole that it could still trigger the async path. I'll respin with that, thanks! -- Peter Xu
