On 10/15/24 6:02 PM, Fabiano Rosas wrote:
Stefan Berger <stef...@linux.ibm.com> writes:
On 10/15/24 3:57 PM, Fabiano Rosas wrote:
Stefan Berger <stef...@linux.ibm.com> writes:
So this here is failing for you every time?
QTEST_QEMU_BINARY=build/qemu-system-aarch64
./build/tests/qtest/tpm-tis-device-swtpm-test
Sorry, I was unclear. No, that runs for about 30 iterations before it
fails. I just ran each of these in a terminal window:
$ for i in $(seq 1 999); do echo "$i =============";
QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/tpm-tis-device-swtpm-test || break
; done
On my Fedora 40 host this command line here alone has been running for
250 loop iterations now and is still continuing.
$ make -j$(nproc) check
So this needs to be run in parallel to the above command line to cause
the failure?
Yes, I've been using that method to reproduce live migration race
conditions as well. It's quite effective.
If you don't think you'll be able to find the root cause due to the
unreproducibility on your side, maybe we could at least add an assert
that bcount is not larger than rsp_size. I think that would at least
give an explicit error instead of a buffer overflow.
I can also try to dig deeper into this when I get some time. At the
moment I know nothing about the tpm device emulation.
The loop has run 3000 times by itself so that part is stable. However,
it seems there is some other test case that the loop cannot run in
parallel with. So, yes there is 'something'. ... ... Just having all
CPUs in a system busy requires waiting for migration to be complete on
the dst_qemu side as well. Can you try it with this patch:
diff --git a/tests/qtest/tpm-tests.c b/tests/qtest/tpm-tests.c
index fb94496bbd..b52cd44841 100644
--- a/tests/qtest/tpm-tests.c
+++ b/tests/qtest/tpm-tests.c
@@ -115,6 +115,7 @@ void tpm_test_swtpm_migration_test(const char
*src_tpm_path,
tpm_util_migrate(src_qemu, uri);
tpm_util_wait_for_migration_complete(src_qemu);
+ tpm_util_wait_for_migration_complete(dst_qemu);
tpm_util_pcrread(dst_qemu, tx, tpm_pcrread_resp,
sizeof(tpm_pcrread_resp));
For me this fixes the issue I had seen where reading the STS register
was done too early before all the TPM TIS state was completely restored.
The active locality was -1 and STS return 0xffffffff and from then on
things went bad.