On Wed, Feb 16, 2022 at 3:52 PM Michael Roth <[email protected]> wrote:
>
> On Wed, Feb 16, 2022 at 10:12:36AM +0100, Markus Armbruster wrote:
> > Michael Roth <[email protected]> writes:
> >
> > > On Mon, Feb 14, 2022 at 03:14:37PM +0100, Markus Armbruster wrote:
> > >> Cc: the qemu-ga maintainer
> > >>
> > >> John Snow <[email protected]> writes:
> > >>
> > >> > [Moving our discussion upstream, because it stopped being brief and 
> > >> > simple.]
> > >
> > > Hi John, Markus,
> > >
> > >>
> > >> Motivation: qemu-ga doesn't do capability negotiation as specified in
> > >> docs/interop/qmp-spec.txt.
> > >>
> > >> Reminder: qmp-spec.txt specifies the server shall send a greeting
> > >> containing the capabilities on offer.  The client shall send a
> > >> qmp_capabilities command before any other command.
> > >>
> > >> We can't just fix qemu-ga to comply, because it would break existing
> > >> clients.
> > >>
> > >> We could document its behavior in qmp-spec.txt.  Easy enough, but also
> > >> kind of sad.
> > >
> > > I'm not sure we could've ever done it QMP-style with the initial
> > > greeting/negotiation mode. It's been a while, I but recall virtio-serial
> > > chardev in guest not having a very straight-forward way of flushing out
> > > data from the vring after a new client connects on the host side, so
> > > new clients had a chance of reading left-over garbage from previous
> > > client sessions. Or maybe it was open/close/open on the guest/chardev
> > > side that didn't cause the flush... anyway:
> > >
> > > This is why guest-sync was there, so you could verify the stream was
> > > in sync with a given "session ID" before continuing. But that doesn't
> > > help much if the stream is in some garbage state that parser can't
> > > recover from...
> > >
> > > This is why guest-sync-delimited was introduced; it inserts a 0xFF
> > > sential value (invalid for any normal QMP stream) prior to response that
> > > a client can scan for to flush the stream. Similarly, clients are
> > > supposed to precede guest-sync/guest-sync-delimited so QGA to get stuck
> > > trying to parse a partial read from an earlier client that is 'eating' a
> > > new request from a new client connection. I don't think these are really
> > > issues with vsock (or the other transports QGA accepts), but AFAIK
> > > Windows is still mostly reliant on virtio-serial, so these are probably
> > > still needed.
> >
> > I believe you're right about the reason being virtio-serial.  I
> > documented it that way in commit 72e9e569d0 "docs/interop/qmp-spec: How
> > to force known good parser state".
> >
> >     2.6 Forcing the JSON parser into known-good state
> >     -------------------------------------------------
> >
> >     Incomplete or invalid input can leave the server's JSON parser in a
> >     state where it can't parse additional commands.  To get it back into
> >     known-good state, the client should provoke a lexical error.
> >
> >     The cleanest way to do that is sending an ASCII control character
> >     other than '\t' (horizontal tab), '\r' (carriage return), or '\n' (new
> >     line).
> >
> >     Sadly, older versions of QEMU can fail to flag this as an error.  If a
> >     client needs to deal with them, it should send a 0xFF byte.
> >
> >     2.7 QGA Synchronization
> >     -----------------------
> >
> >     When a client connects to QGA over a transport lacking proper
> >     connection semantics such as virtio-serial, QGA may have read partial
> >     input from a previous client.  The client needs to force QGA's parser
> >     into known-good state using the previous section's technique.
> >     Moreover, the client may receive output a previous client didn't read.
> >     To help with skipping that output, QGA provides the
> >     'guest-sync-delimited' command.  Refer to its documentation for
> >     details.
> >
> > 0xFF is invalid UTF-8, which is kind of icky.  We should've used a
> > proper control character like EOT (end of transmission) from the start.
> > Water under the bridge.
> >
> > guest-sync has another design flaw: an unread command reply consisting
> > of just an integer can be confused with guest-sync's reply.  Unlikely as
> > long as guest-sync's @id argument is chosen at random, as its
> > documentation demands.
> >
> > guest-sync could be deprecated, I guess.
>
> Yes, should probably be deprecated in favor of guest-sync-delimited. I
> left it for clients that really don't want to dig into the transport
> layer to search for 0xFF, but still want at least some ability to
> re-sync.
>
> >
> > The @id argument of guest-sync and guest-sync-delimited feels kind of
> > redundant with the command object's @id member.  Except QGA didn't
> > conform to the QMP spec until commit 4eaca8de26 "qmp: common 'id'
> > handling & make QGA conform to QMP spec" (v4.0.0).  More water under the
> > bridge.
> >
> > Note that there's no need for all this when the transport provides
> > proper connection semantics.  Clients relying on connection semantics
> > work fine even when they don't bother with this syncing stuff.  Do such
> > clients exist?  We probably don't know.  May or may not matter.
>
> True, I think only virtio-serial and maybe isa-serial require the sync.
> I was hoping virtio-vsock might quickly become the default, then most
> clients would no longer need guest-sync*, but Windows still seems to be
> reliant on virtio-serial (AFAIK).
>
> >
> > > So QGA has sort of always had its own hand-shake, ideally via
> > > guest-sync-delimited. So if this new negotiation mechanism could
> > > build off of that, rather than introducing something on top, that would
> > > be ideal. Unfortunately it's naming isn't great for what's being done
> > > here, but 'synchronize' is sorta in the ball-park at least...
> >
> > Fair point.
> >
> > >> Is there a way to add capability negotiation to qemu-ga without breaking
> > >> existing clients?  We obviously have to make it optional.
> > >>
> > >> The obvious idea "make qmp_capabilities optional" doesn't work, because
> > >> the client needs to receive the greeting before sending
> > >> qmp_capabilities, to learn what capabilities are on offer.
> > >>
> > >> This leads to...
> > >>
> > >> > What about something like this:
> > >> >
> > >> > Add a new "request-negotiation" command to qemu-guest-agent 7.0.0.
> > >> >
> > >> > [Modern client to unknown server]
> > >> > 1. A modern client connects to a server of unknown version, and
> > >> > without waiting, issues the "request-negotiation" command.
> > >> > 2. An old server will reply with CommandNotFound. We are done 
> > >> > negotiating.
> > >> > 3. A modern server will reply with the greeting in the traditional
> > >> > format, but as a reply object (to preserve "execute" semantics.)
> > >> > 4. The modern client will now issue qmp-capabilities as normal.
> > >> > 5. The server replies with success or failure as normal.
> > >> > 6. Connection is fully established.
> > >> >
> > >> > [Old client to unknown server]
> > >> > 1. An old client connects to an unknown version server.
> > >> > 2. A command is issued some time later.
> > >> >   2a. The server is old, the command worked as anticipated.
> > >> >   2b. The server is new, the command fails with CommandNotFound and
> > >> > urges the use of 'request-negotiation'.
> > >>
> > >> A new server could accept the command, too.  This way, negotiation
> > >> remains optional, unlike in "normal" QMP.  Old clients don't negotiate,
> > >> and get default capabilities.
> > >
> > > ...so what if we had guest-sync/guest-sync-delimited start returning
> > > capabilities and version fields rather than a new request-negotiation
> > > command? If they aren't present we know the server is too old, and don't
> > > have to rely on CommandNotFound to determine that.
> >
> > Both guest-sync and guest-sync-delimited have a design flaw that defeats
> > such a straighforward extension: 'returns': 'int'.  There is no way to
> > return more data compatibly.
>
> Ugh, I misread the scheme and thought it was a struct with a single
> field... I knew that seemed to good to be true.
>
> >
> > We could add an optional flag to guest-sync-delimited to make it return
> > an object.  But I think we'd be better off with a new command.
>
> New cmd is probably for the best then, since hopefully guest-sync-delimited
> can become irrelevant in the future, and the possibility of a failed sync
> command on old clients (due to new param) getting mixed in with the recovery
> logic for a failed negotiation/capability probe, would probably make the
> whole interface a bit too unwieldy.
>
> > >> > - QGA is now officially on a different flavor of QMP protocol. You
> > >> > still need to know in advance if you are connecting to QGA or anything
> > >> > else. That's still a little sad, but maybe that's just simply an
> > >> > impossible goal.
> > >> >
> > >> > Bonus:
> > >> > - If an execution ID is used when sending "request-negotiation", we
> > >> > know that the server is at least version 4.0.0 if it responds to us
> > >> > using that ID. A modern client can then easily distinguish between
> > >> > pre-4.0, post-4.0 and post-7.0 servers. It's a useful probe.
> > >
> > > Is that in reference to the optional 'id' field that can be passed to
> > > QMP requests? Don't see the harm in that, and QGA should pass them back
> > > intact.
> >
> > I think it does since commit 4eaca8de26 "qmp: common 'id' handling &
> > make QGA conform to QMP spec" (v4.0.0).
>
> Ah, right, sort of remember now. Thanks for the pointer.
>
> -Mike
>

Thank you both for the history lesson!

This came up because qmp-shell and our qmp library claim to support
connecting to QGA targets, but they do so by just *not* sending a
greeting at all. I was completely ignorant that there was already a
different handshaking procedure needed for QGA targets. We don't use
it at all in our unit test infrastructure.

It was brought to my attention in particular that QGA prior to 4.0
does not support execution IDs properly, so I need to add some kind of
"probe" to the client to test for that support. In theory, a client
could just always opt to never use execution IDs when connecting to a
QGA target, but I find them to be extremely helpful for reading server
logs and so I believe that a Good Client :tm: should use them
absolutely all the time unless circumstances prohibit it.

Before I press on the idea of adding a more rigorous handshake, I'll
need to go and study the QGA limitations and do some more testing to
make sure I am actually solving any real problem, then. More
constraints than I was aware of. (Shucks.)

Thanks!
--js


Reply via email to