On 06. 02. 26, 12:54, Matthieu Baerts wrote:
Our CI for the MPTCP subsystem is now regularly hitting various stalls
before even starting the MPTCP test suite. These issues are visible on
top of the latest net and net-next trees, which have been sync with
Linus' tree yesterday. All these issues have been seen on a "public CI"
using GitHub-hosted runners with KVM support, where the tested kernel is
launched in a nested (I suppose) VM. I can see the issue with or without
debug.config. According to the logs, it might have started around
v6.19-rc0, but I was unavailable for a few weeks, and I couldn't react
quicker, sorry for that. Unfortunately, I cannot reproduce this locally,
and the CI doesn't currently have the ability to execute bisections.

Hmm, after the switch of the qemu guest kernels to 6.19, our (opensuse) build service is stalling in smp_call_function_many_cond() randomly too:
https://bugzilla.suse.com/show_bug.cgi?id=1258936

The attachment from there contains sysrq-t logs too:
https://bugzilla.suse.com/attachment.cgi?id=888612

The stalls happen before starting the MPTCP test suite. The init program
creates a VSOCK listening socket via socat [1], and different hangs are
then visible: RCU stalls followed by a soft lockup [2], only a soft
lockup [3], sometimes the soft lockup comes with a delay [4] [5], or
there is no RCU stalls or soft lockups detected after one minute, but VM
is stalled [6]. In the last case, the VM is stopped after having
launched GDB to get more details about what was being executed.

It feels like the issue is not directly caused by the VSOCK listening
socket, but the stalls always happen after having started the socat
command [1] in the background.

It fails randomly while building random packages (go, libreoffice, bayle, ...). I don't think it is VSOCK related in those cases, but who knows what the builds do...

I cannot reproduce locally either.

I came across:
  614da1d3d4cd x86: make page fault handling disable interrupts properly
but I have no idea if it could have impact on this at all.

thanks,
--
js
suse labs


Reply via email to