Ilya Leoshkevich <[email protected]> writes:

> On Fri, 2025-11-28 at 18:25 +0100, Ilya Leoshkevich wrote:
>> On Fri, 2025-11-28 at 14:39 +0100, Thomas Huth wrote:
>> > From: Thomas Huth <[email protected]>
>> > 
>> > We just have to make sure that we can set the endianness to big
>> > endian,
>> > then we can also run this test on s390x.
>> > 
>> > Signed-off-by: Thomas Huth <[email protected]>
>> > ---
>> >  Marked as RFC since it depends on the fix for this bug (so it
>> > cannot
>> >  be merged yet):
>> >  
>> > https://lore.kernel.org/qemu-devel/[email protected]
>> > /
>> > 
>> >  tests/functional/reverse_debugging.py        |  4 +++-
>> >  tests/functional/s390x/meson.build           |  1 +
>> >  tests/functional/s390x/test_reverse_debug.py | 21
>> > ++++++++++++++++++++
>> >  3 files changed, 25 insertions(+), 1 deletion(-)
>> >  create mode 100755 tests/functional/s390x/test_reverse_debug.py
>> 
>> Reviewed-by: Ilya Leoshkevich <[email protected]>
>> 
>> 
>> I have a simple fix which helps with your original report, but not
>> with this test. I'm still investigating.
>> 
>> --- a/target/s390x/machine.c
>> +++ b/target/s390x/machine.c
>> @@ -52,6 +52,14 @@ static int cpu_pre_save(void *opaque)
>>          kvm_s390_vcpu_interrupt_pre_save(cpu);
>>      }
>>  
>> +    if (tcg_enabled()) {
>> +        /*
>> +         * Ensure symmetry with cpu_post_load() with respect to
>> +         * CHECKPOINT_CLOCK_VIRTUAL.
>> +         */
>> +        tcg_s390_tod_updated(CPU(cpu), RUN_ON_CPU_NULL);
>> +    }
>> +
>>      return 0;
>>  }
>
> Interestingly enough, this patch fails only under load, e.g., if I run
> make check -j"$(nproc)" or if I run your test in isolation, but with
> stress-ng cpu in background. The culprit appears to be:
>
> s390_tod_load()
>   qemu_s390_tod_set()
>     async_run_on_cpu(tcg_s390_tod_updated)
>
> Depending on the system load, this additional tcg_s390_tod_updated()
> may or may not end up being called during handle_backward(). If it
> does, we get an infinite loop again, because now we need two
> checkpoints.
>
> I have a feeling that this code may be violating some record-replay
> requirement, but I can't quite put my finger on it. For example,
> async_run_on_cpu() does not sound like something deterministic, but
> then again it just queues work for rr_cpu_thread_fn(), which is
> supposed to be deterministic.

The the async_run_on_cpu is called from the vcpu thread in response to a
deterministic event at a known point in time it should be fine. If it
came from another thread that is not synchronised via replay_lock then
things will go wrong.

But this is a VM load save helper?

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Reply via email to