On Wed, Oct 13, 2021 at 8:51 AM John Snow <[email protected]> wrote: > > > On Wed, Oct 13, 2021 at 4:45 AM Hanna Reitz <[email protected]> wrote: > >> On 13.10.21 00:34, John Snow wrote: >> > Based-on: <[email protected]> >> > [PULL 00/10] Python patches >> > GitLab: >> https://gitlab.com/jsnow/qemu/-/commits/python-aqmp-iotest-wrapper >> > CI: https://gitlab.com/jsnow/qemu/-/pipelines/387210591 >> > >> > Hiya, >> > >> > This series continues where the last two AQMP series left off and adds a >> > synchronous 'legacy' wrapper around the new AQMP interface, then drops >> > it straight into iotests to prove that AQMP is functional and totally >> > cool and fine. The disruption and churn to iotests is pretty minimal. >> > >> > In the event that a regression happens and I am not physically proximate >> > to inflict damage upon, one may set the QEMU_PYTHON_LEGACY_QMP variable >> > to any non-empty string as it pleases you to engage the QMP machinery >> > you are used to. >> > >> > I'd like to try and get this committed early in the 6.2 development >> > cycle to give ample time to smooth over any possible regressions. I've >> > tested it locally and via gitlab CI, across Python versions 3.6 through >> > 3.10, and "worksforme". If something bad happens, we can revert the >> > actual switch-flip very trivially. >> >> So running iotests locally, I got one failure: >> >> $ TEST_DIR=/tmp/vdi-tests ./check -c writethrough -vdi 300 >> [...] >> 300 fail [10:28:06] [10:28:11] >> 5.1s output mismatch (see 300.out.bad) >> --- /home/maxx/projects/qemu/tests/qemu-iotests/300.out >> +++ 300.out.bad >> @@ -1,4 +1,5 @@ >> -....................................... >> +..............ERROR:qemu.aqmp.qmp_client.qemu-b-222963:Task.Reader: >> ConnectionResetError: [Errno 104] Connection reset by peer >> +......................... >> ---------------------------------------------------------------------- >> Ran 39 tests >> [...] >> >> > Oh, unfortunate. > > >> >> I’m afraid I can’t really give a reproducer or anything. It feels like >> > > Thank you for the report! > > >> just some random spurious timing-related error. Although then again, >> 300 does have an `except machine.AbnormalShutdown` clause at one >> point... So perhaps that’s the culprit, and we need to disable logging >> there. >> >> > I'll investigate! >
Unfortunately, even in a loop some 150 times I couldn't reproduce this one. As you point out, it appears to be just a failure caused by logging. The test logic itself completes as expected. Still, I would expect, on a "clean" shutdown of the destination host (where the destination process fails to load the migration stream and voluntarily exits with an error code) to end with a FIN/ACK for TCP or ... uh, whatever happens for a UNIX socket. Where's the Connection Reset coming from? Did the destination VM process *crash*? I'm not so sure that I *should* silence this error, but I also can't reproduce it at all to answer these questions, so uh. uhhh. I guess I will just hammer it on a loop a few hundred times more and see if I get lucky.
