On Fri, Aug 14, 2020 at 04:01:47PM -0700, Elena Ufimtseva wrote:
> On Tue, Aug 11, 2020 at 03:41:30PM +0100, Stefan Hajnoczi wrote:
> > On Fri, Jul 31, 2020 at 02:20:24PM -0400, Jagannathan Raman wrote:
> > > @@ -343,3 +349,49 @@ static void probe_pci_info(PCIDevice *dev, Error
> > > **errp)
> > > }
> > > }
> > > }
> > > +
> > > +static void hb_msg(PCIProxyDev *dev)
> > > +{
> > > + DeviceState *ds = DEVICE(dev);
> > > + Error *local_err = NULL;
> > > + MPQemuMsg msg = { 0 };
> > > +
> > > + msg.cmd = PROXY_PING;
> > > + msg.bytestream = 0;
> > > + msg.size = 0;
> > > +
> > > + (void)mpqemu_msg_send_and_await_reply(&msg, dev->ioc, &local_err);
> > > + if (local_err) {
> > > + error_report_err(local_err);
> > > + qio_channel_close(dev->ioc, &local_err);
> > > + error_setg(&error_fatal, "Lost contact with device %s", ds->id);
> > > + }
> > > +}
> >
> > Here is my feedback from the last revision. Was this addressed?
> >
>
> Hi Stefan,
>
> Thank you for reviewing the patchset. In this version we decided to
> shutdown the guest when the heartbeat did not get a reply from the
> remote by setting the error_fatal.
> Should we approach it differently or you prefer us to get rid of the
> heartbeat in this form?I think the only case that this patch handles is when the mpqemu channel is closed. The VM hangs when the channel is still open but the remote is unresponsive. (mpqemu_msg_send_and_await_reply() calls aio_poll() with the global mutex held so vcpus cannot make progress.) The heartbeat mechanism needs to handle the case where the other side isn't responding. It can't hang QEMU. I suggest dropping this patch. It can be done later. Stefan
signature.asc
Description: PGP signature
