On Thu, 6 Aug 2020 12:46:43 +0300 Nikolay Aleksandrov <niko...@cumulusnetworks.com> wrote:
> On 06/08/2020 12:17, Rasmus Villemoes wrote: > > On 06/08/2020 01.34, Stephen Hemminger wrote: > >> On Wed, 5 Aug 2020 16:25:23 +0200 > >> Rasmus Villemoes <rasmus.villem...@prevas.dk> wrote: > >> > >>> Hi, > >>> > >>> We're seeing occasional lockups on an embedded board (running an -rt > >>> kernel), which I believe I've tracked down to the > >>> > >>> if (!rtnl_trylock()) > >>> return restart_syscall(); > >>> > >>> in net/bridge/br_sysfs_br.c. The problem is that some SCHED_FIFO task > >>> writes a "1" to the /sys/class/net/foo/bridge/flush file, while some > >>> lower-priority SCHED_FIFO task happens to hold rtnl_lock(). When that > >>> happens, the higher-priority task is stuck in an eternal ERESTARTNOINTR > >>> loop, and the lower-priority task never gets runtime and thus cannot > >>> release the lock. > >>> > >>> I've written a script that rather quickly reproduces this both on our > >>> target and my desktop machine (pinning everything on one CPU to emulate > >>> the uni-processor board), see below. Also, with this hacky patch > >> > >> There is a reason for the trylock, it works around a priority inversion. > > > > Can you elaborate? It seems to me that it _causes_ a priority inversion > > since priority inheritance doesn't have a chance to kick in. > > > >> The real problem is expecting a SCHED_FIFO task to be safe with this > >> kind of network operation. > > > > Maybe. But ignoring the SCHED_FIFO/rt-prio stuff, it also seems a bit > > odd to do what is essentially a busy-loop - yes, the restart_syscall() > > allows signals to be delivered (including allowing the process to get > > killed), but in the absence of any signals, the pattern essentially > > boils down to > > > > while (!rtnl_trylock()) > > ; > > > > So even for regular tasks, this seems to needlessly hog the cpu. > > > > I tried this > > > > diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c > > index 0318a69888d4..e40e264f9b16 100644 > > --- a/net/bridge/br_sysfs_br.c > > +++ b/net/bridge/br_sysfs_br.c > > @@ -44,8 +44,8 @@ static ssize_t store_bridge_parm(struct device *d, > > if (endp == buf) > > return -EINVAL; > > > > - if (!rtnl_trylock()) > > - return restart_syscall(); > > + if (rtnl_lock_interruptible()) > > + return -ERESTARTNOINTR; > > > > err = (*set)(br, val); > > if (!err) > > > > with the obvious definition of rtnl_lock_interruptible(), and it makes > > the problem go away. Isn't it better to sleep waiting for the lock (and > > with -rt, giving proper priority boost) or a signal to arrive rather > > than busy-looping back and forth between syscall entry point and the > > trylock()? > > > > I see quite a lot of > > > > if (mutex_lock_interruptible(...)) > > return -ERESTARTSYS; > > > > but for the rtnl_mutex, I see the trylock...restart_syscall pattern > > being used in a couple of places. So there must be something special > > about the rtnl_mutex? > > > > Thanks, > > Rasmus > > > > Hi Rasmus, > I haven't tested anything but git history (and some grepping) points to > deadlocks when > sysfs entries are being changed under rtnl. > For example check: af38f2989572704a846a5577b5ab3b1e2885cbfb and > 336ca57c3b4e2b58ea3273e6d978ab3dfa387b4c > This is a common usage pattern throughout net/, the bridge is not the only > case and there are more > commits which talk about deadlocks. > Again I haven't verified anything but it seems on device delete (w/ rtnl > held) -> sysfs delete > would wait for current readers, but current readers might be stuck waiting on > rtnl and we can deadlock. > I was referring to AB BA lock inversion problems. Yes the trylock goes back to: commit af38f2989572704a846a5577b5ab3b1e2885cbfb Author: Eric W. Biederman <ebied...@xmission.com> Date: Wed May 13 17:00:41 2009 +0000 net: Fix bridgeing sysfs handling of rtnl_lock Holding rtnl_lock when we are unregistering the sysfs files can deadlock if we unconditionally take rtnl_lock in a sysfs file. So fix it with the now familiar patter of: rtnl_trylock and syscall_restart() Signed-off-by: Eric W. Biederman <ebied...@aristanetworks.com> Signed-off-by: David S. Miller <da...@davemloft.net> The problem is that the unregister of netdevice happens under rtnl and this unregister path has to remove sysfs and other objects. So those objects have to have conditional locking.