Ilya Leoshkevich <[email protected]> writes: > On Fri, 2025-11-28 at 18:25 +0100, Ilya Leoshkevich wrote: >> On Fri, 2025-11-28 at 14:39 +0100, Thomas Huth wrote: >> > From: Thomas Huth <[email protected]> >> > >> > We just have to make sure that we can set the endianness to big >> > endian, >> > then we can also run this test on s390x. >> > >> > Signed-off-by: Thomas Huth <[email protected]> >> > --- >> > Marked as RFC since it depends on the fix for this bug (so it >> > cannot >> > be merged yet): >> > >> > https://lore.kernel.org/qemu-devel/[email protected] >> > / >> > >> > tests/functional/reverse_debugging.py | 4 +++- >> > tests/functional/s390x/meson.build | 1 + >> > tests/functional/s390x/test_reverse_debug.py | 21 >> > ++++++++++++++++++++ >> > 3 files changed, 25 insertions(+), 1 deletion(-) >> > create mode 100755 tests/functional/s390x/test_reverse_debug.py >> >> Reviewed-by: Ilya Leoshkevich <[email protected]> >> >> >> I have a simple fix which helps with your original report, but not >> with this test. I'm still investigating. >> >> --- a/target/s390x/machine.c >> +++ b/target/s390x/machine.c >> @@ -52,6 +52,14 @@ static int cpu_pre_save(void *opaque) >> kvm_s390_vcpu_interrupt_pre_save(cpu); >> } >> >> + if (tcg_enabled()) { >> + /* >> + * Ensure symmetry with cpu_post_load() with respect to >> + * CHECKPOINT_CLOCK_VIRTUAL. >> + */ >> + tcg_s390_tod_updated(CPU(cpu), RUN_ON_CPU_NULL); >> + } >> + >> return 0; >> } > > Interestingly enough, this patch fails only under load, e.g., if I run > make check -j"$(nproc)" or if I run your test in isolation, but with > stress-ng cpu in background. The culprit appears to be: > > s390_tod_load() > qemu_s390_tod_set() > async_run_on_cpu(tcg_s390_tod_updated) > > Depending on the system load, this additional tcg_s390_tod_updated() > may or may not end up being called during handle_backward(). If it > does, we get an infinite loop again, because now we need two > checkpoints. > > I have a feeling that this code may be violating some record-replay > requirement, but I can't quite put my finger on it. For example, > async_run_on_cpu() does not sound like something deterministic, but > then again it just queues work for rr_cpu_thread_fn(), which is > supposed to be deterministic.
The the async_run_on_cpu is called from the vcpu thread in response to a deterministic event at a known point in time it should be fine. If it came from another thread that is not synchronised via replay_lock then things will go wrong. But this is a VM load save helper? -- Alex Bennée Virtualisation Tech Lead @ Linaro
