On Wed, Sep 7, 2022 at 10:40 AM Peter Maydell <[email protected]> wrote:
> On Wed, 7 Sept 2022 at 16:39, Patrick Venture <[email protected]> wrote: > > > > # Start of nvme tests > > # Start of pci-device tests > > # Start of pci-device-tests tests > > # starting QEMU: exec ./qemu-system-aarch64 -qtest > unix:/tmp/qtest-1431.sock -qtest-log /dev/null -chardev > socket,path=/tmp/qtest-1431.qmp,id=char0 -mon chardev=char0,mode=control > -display none -M virt, -cpu max -drive > id=drv0,if=none,file=null-co://,file.read-zeroes=on,format=raw -object > memory-backend-ram,id=pmr0,share=on,size=8 -device > nvme,addr=04.0,drive=drv0,serial=foo -accel qtest > > > > # > ERROR:../../src/qemu/tests/qtest/libqtest.c:338:qtest_init_without_qmp_handshake: > assertion failed: (s->fd >= 0 && s->qmp_fd >= 0) > > stderr: > > double free or corruption (out) > > socket_accept failed: Resource temporarily unavailable > > ** > > > ERROR:../../src/qemu/tests/qtest/libqtest.c:338:qtest_init_without_qmp_handshake: > assertion failed: (s->fd >= 0 && s->qmp_fd >= 0) > > ../../src/qemu/tests/qtest/libqtest.c:165: kill_qemu() detected QEMU > death from signal 6 (Aborted) (core dumped) > > > > I'm not seeing this reliably, and we haven't done a lot of digging yet, > such as enabling sanitizers, so I'll reply back to this thread with details > as I have them. > > > > Has anyone seen this before or something like it? > > Have a look in the source at what exactly the assertion > failure in libqtest.c is checking for -- IIRC it's a pretty > basic "did we open a socket fd" one. I think sometimes I > used to see something like this if there's an old stale socket > lying around in the test directory and the randomly generated > socket filename happens to clash with it. > Thanks for the debugging tip! I can't reproduce it at this point. I saw it 2-3 times, and now not at all. So more than likely it's exactly what you're describing. > > Everything after that is probably follow-on errors from the > tests not being terribly clean about error handling. > > Are you running 'make check' with a -j option for parallel? > (This is supposed to work, and it's the standard way I run > 'make check', so if it's flaky we need to fix it, but it > would be interesting to know if the issue repros at -j1.) > Since it's not reproducing reliably -- and I haven't actually seen it since the first few instances (and it was unrelated to those patches in flight), I'll have to sit on further debug until we reproduce it and then I can let you know, but this seems to be flaky at the point where it's hard to detect. > > -- PMM >
