* Daniel P. Berrangé ([email protected]) wrote: > On Tue, Feb 13, 2018 at 03:09:12PM +0000, Dr. David Alan Gilbert wrote: > > * Thomas Huth ([email protected]) wrote: > > > We are currently facing some migration failure on s390x when running > > > certain avocado tests, e.g. when running the test > > > type_specific.io-github-autotest-qemu.migrate.with_reboot.exec.gzip_exec. > > > This test is using 'migrate -d "exec:nc localhost 5200"' for the > > > migration. > > > The problem is detected at the receiving side, where the migration stream > > > apparently ends too early. However, the cause for the problem is the > > > sending side: After writing the migration stream into the pipe to netcat, > > > the source QEMU calls qio_channel_command_close() which closes the pipe > > > and immediately (!) kills the child process afterwards. So if the > > > sending netcat did not read the final bytes from the pipe yet, or > > > if it did not manage to send out all its buffers yet, it is killed > > > before the whole migration stream is passed to the destination side. > > > > Thanks for tracking that down! > > > > > To ease the situation at least a little bit, we should give the child > > > process at least some few more time slices before we kill it with > > > SIGTERM and then with SIGKILL. With this change, the avocado test now > > > succeeds here in 10 out of 10 runs. > > > > > > Signed-off-by: Thomas Huth <[email protected]> > > > --- > > > io/channel-command.c | 6 +++--- > > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > > > diff --git a/io/channel-command.c b/io/channel-command.c > > > index 319c5ed..f64db3e 100644 > > > --- a/io/channel-command.c > > > +++ b/io/channel-command.c > > > @@ -177,11 +177,11 @@ static int > > > qio_channel_command_abort(QIOChannelCommand *ioc, > > > return -1; > > > } > > > } else if (ret == 0) { > > > - if (step == 0) { > > > + if (step == 4) { > > > kill(ioc->pid, SIGTERM); > > > - } else if (step == 1) { > > > + } else if (step == 8) { > > > kill(ioc->pid, SIGKILL); > > > - } else { > > > + } else if (step >= 9) { > > > > Hmm. This seems pretty arbitrary; if I understand correctly you're > > saying it'll get a SIGTERM after 4 (arbitrary) * 10ms (arbitrary). > > > > Who is to say that's enough for a scp or gzip or the like? > > We could conceivably implement the qio_channel_shutdown() operation > for the QIOChannelCommand class. It would merely close the FD to the > child process, but leave it running. That would give it time to read > any data still in the pipe from QEMU IIUC.
Yeh that's better; although when would we call shutdown or close on it? Dave > > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- Dr. David Alan Gilbert / [email protected] / Manchester, UK
