On Wed, 5 Aug 2020 16:25:23 +0200 Rasmus Villemoes <rasmus.villem...@prevas.dk> wrote:
> Hi, > > We're seeing occasional lockups on an embedded board (running an -rt > kernel), which I believe I've tracked down to the > > if (!rtnl_trylock()) > return restart_syscall(); > > in net/bridge/br_sysfs_br.c. The problem is that some SCHED_FIFO task > writes a "1" to the /sys/class/net/foo/bridge/flush file, while some > lower-priority SCHED_FIFO task happens to hold rtnl_lock(). When that > happens, the higher-priority task is stuck in an eternal ERESTARTNOINTR > loop, and the lower-priority task never gets runtime and thus cannot > release the lock. > > I've written a script that rather quickly reproduces this both on our > target and my desktop machine (pinning everything on one CPU to emulate > the uni-processor board), see below. Also, with this hacky patch There is a reason for the trylock, it works around a priority inversion. The real problem is expecting a SCHED_FIFO task to be safe with this kind of network operation.