Re: rtnl_trylock() versus SCHED_FIFO lockup

Stephen Hemminger Wed, 05 Aug 2020 16:35:04 -0700

On Wed, 5 Aug 2020 16:25:23 +0200
Rasmus Villemoes <rasmus.villem...@prevas.dk> wrote:


> Hi,
> 
> We're seeing occasional lockups on an embedded board (running an -rt
> kernel), which I believe I've tracked down to the
> 
>             if (!rtnl_trylock())
>                     return restart_syscall();
> 
> in net/bridge/br_sysfs_br.c. The problem is that some SCHED_FIFO task
> writes a "1" to the /sys/class/net/foo/bridge/flush file, while some
> lower-priority SCHED_FIFO task happens to hold rtnl_lock(). When that
> happens, the higher-priority task is stuck in an eternal ERESTARTNOINTR
> loop, and the lower-priority task never gets runtime and thus cannot
> release the lock.
> 
> I've written a script that rather quickly reproduces this both on our
> target and my desktop machine (pinning everything on one CPU to emulate
> the uni-processor board), see below. Also, with this hacky patch

There is a reason for the trylock, it works around a priority inversion.

The real problem is expecting a SCHED_FIFO task to be safe with this
kind of network operation.

Re: rtnl_trylock() versus SCHED_FIFO lockup

Reply via email to