Re: apparent race condition in mttcg memory handling

Pierrick Bouvier Mon, 21 Jul 2025 10:26:53 -0700

On 7/21/25 10:14 AM, Michael Tokarev wrote:

On 21.07.2025 19:29, Pierrick Bouvier wrote:

On 7/21/25 9:23 AM, Pierrick Bouvier wrote:

..

looks like a good target for TSAN, which might expose the race without
really having to trigger it.
https://www.qemu.org/docs/master/devel/testing/main.html#building-and-
testing-with-tsan


I think I tried with TSAN and it gave something useful even.
The prob now is to reproduce the thing by someone more familiar
with this stuff than me :)

Else, you can reproduce your run using rr record -h (chaos mode) [1],
which randomly schedules threads, until it catches the segfault, and
then you'll have a reproducible case to debug.


In case you never had opportunity to use rr, it is quite convenient,
because you can set a hardware watchpoint on your faulty pointer (watch
-l), do a reverse-continue, and in most cases, you'll directly reach
where the bug happened. Feels like cheating.


rr is the first thing I tried.  Nope, it's absolutely hopeless.   It
tried to boot just the kernel for over 30 minutes, after which I just
gave up.

I had a similar thing to debug recently, and with a simple loop, Icouldn't expose it easily. The bug I had was triggered with 3%probability, which seems close from yours.As rr record -h is single threaded, I found useful to write a wrapperscript [1] to run one instance, and then run it in parallel using:

./run_one.sh | head -n 10000 | parallel --bar -j$(nproc)

With that, I could expose the bug in 2 minutes reliably (vs trying formore than one hour before). With your 64 cores, I'm sure it will quicklyexpose it.

Might be worth a try, as you need to only catch the bug once to be ableto reproduce it.


[1] https://github.com/pbo-linaro/qemu/blob/master/try_rme.sh

Thanks,

/mjt

Re: apparent race condition in mttcg memory handling

Reply via email to