Ilya Leoshkevich <[email protected]> writes:
> On Mon, 2025-12-01 at 10:36 +0000, Alex Bennée wrote:
>> Ilya Leoshkevich <[email protected]> writes:
>>
>> > On Sun, 2025-11-30 at 20:03 +0100, Ilya Leoshkevich wrote:
>> > > On Sun, 2025-11-30 at 19:32 +0100, Ilya Leoshkevich wrote:
>> > > > On Sun, 2025-11-30 at 16:47 +0000, Alex Bennée wrote:
>> > > > > Ilya Leoshkevich <[email protected]> writes:
>> > > > >
>> > > > > > On Fri, 2025-11-28 at 18:25 +0100, Ilya Leoshkevich wrote:
>> > > > > > > On Fri, 2025-11-28 at 14:39 +0100, Thomas Huth wrote:
>> > > > > > > > From: Thomas Huth <[email protected]>
<snip>
>> > > > > The the async_run_on_cpu is called from the vcpu thread in
>> > > > > response
>> > > > > to a
>> > > > > deterministic event at a known point in time it should be
>> > > > > fine.
>> > > > > If
>> > > > > it
>> > > > > came from another thread that is not synchronised via
>> > > > > replay_lock
>> > > > > then
>> > > > > things will go wrong.
>> > > > >
>> > > > > But this is a VM load save helper?
>> > > >
>> > > > Yes, and it's called from the main thread. Either during
>> > > > initialization, or as a reaction to GDB packets.
>> > > >
>> > > > Here is the call stack:
>> > > >
>> > > > qemu_loadvm_state()
>> > > > qemu_loadvm_state_main()
>> > > > qemu_loadvm_section_start_full()
>> > > > vmstate_load()
>> > > > vmstate_load_state()
>> > > > cpu_post_load()
>> > > > tcg_s390_tod_updated()
>> > > > update_ckc_timer()
>> > > > timer_mod()
>> > > > s390_tod_load()
>> > > > qemu_s390_tod_set() # via tdc->set()
>> > > > async_run_on_cpu(tcg_s390_tod_updated)
>> > > >
>> > > > So you think we may have to take the replay lock around
>> > > > load_snapshot()? So that all async_run_on_cpu() calls it makes
>> > > > end
>> > > > up
>> > > > being handled by the vCPU thread deterministically.
<snip>
>> >
>> > I believe now I at least understand what the race is about:
>> >
>> > - cpu_post_load() fires the TOD timer immediately.
>> >
>> > - s390_tod_load() schedules work for firing the TOD timer.
>>
>> Is this a duplicate of work then? Could we just rely on one or the
>> other? If you drop the cpu_post_load() tweak then the vmstate load
>> helper should still ensure everything works right?
>
> Getting rid of it fixes the problem and makes sense anyway.
>
>> > - If rr loop sees work and then timer, we get one timer callback.
>> >
>> > - If rr loop sees timer and then work, we get two timer callbacks.
>>
>> If the timer is armed we should expect at least to execute a few
>> instructions before triggering the timer, unless it was armed ready
>> expired.
>
> Yes, it is armed expired.
>
> Isn't it a deficiency in record-replay that work and timers are not
> ordered relative to each other? Can't it bite us somewhere else?
They normally should be although I notice:
void icount_handle_deadline(void)
{
assert(qemu_in_vcpu_thread());
int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
QEMU_TIMER_ATTR_ALL);
/*
* Instructions, interrupts, and exceptions are processed in cpu-exec.
* Don't interrupt cpu thread, when these events are waiting
* (i.e., there is no checkpoint)
*/
if (deadline == 0) {
icount_notify_aio_contexts();
}
}
should run the pre-expired timers before we exec the current TB. But the
comment suggests it is not expecting any checkpoint related activity. I
wonder if we can assert that is the case to catch future issues.
>> > - Record and replay may diverge due to this race.
>> >
>> > - In this particular case divergence makes rr loop spin: it sees
>> > that
>> > TOD timer has expired, but cannot invoke its callback, because
>> > there
>> > is no recorded CHECKPOINT_CLOCK_VIRTUAL.
>> >
>> > - The order in which rr loop sees work and timer depends on whether
>> > and when rr loop wakes up during load_snapshot().
>> >
>> > - rr loop may wake up after the main thread kicks the CPU and drops
>> > the BQL, which may happen if it calls, e.g.,
>> > qemu_cond_wait_bql().
--
Alex Bennée
Virtualisation Tech Lead @ Linaro