Copied netdev as that is the more appropriate mailling list for questions like this.
Xianpeng Zhao <xpz...@aerohive.com> writes: > Hi Group, > > I have find a problem in my system, I found there have a chance that > cause the system enter dead loop when try to get the rtnl lock in the sysctl > function in net/ipv6/addrconf.c > > The situation should like this, there are 2 processes may need get > the rtnl lock, we call them process A and process B, A have high priority > than B. > B need get the rtnl lock to do something, when B schedule out without release > the lock; At this time, the A start to run "echo 1 > > /proc/sys/net/ipv6/conf/<ifname>/disable_ipv6", the echo process will run to > this code: > > if (!rtnl_trylock()) > > return restart_syscall(); > > Because the rtnl lock was hold by process B, so here the try will be failure, > and run the restart_syscall to let the sys_write do again, even try many > times, because the B have very lower priority, the lock was hard to be > released, so the echo process created by A will enter a loop of restart > system call. > > In my case it is the wireless_nlevent_process in process kworker taken the > rtnl lock, and another higher priority process need use echo to disable IPv6 > met this problem. > > I am not very sure, but I think it is better to let the process A sleep a > while instead of try it again and again without any delay. > > Expects, what's your opinions? That the entire situation is a mess. From what little I have seen it is a very rare condition. Does this reproduce easily in your environment? If we are going the delay route we probably want to put the delay in restart_syscall or in a wrapper around restart_syscall that we use for the rtnl_trylock failure case. On first blush I would suggest the logic for sleeping should be: if (need_reschedule()) schedule(); That will limit the spinning to a single time slice which is definitely preferrable. Ugh. But we already cross the kernel/userspace boundary that already does that. If you are encountering a deadlock it is very much because you have been playing very ugly priority games. At which point my sympathies but this feels like a case of "Docter it hurts when I do this. Then don't do that." > @@ -5304,8 +5308,10 @@ static int addrconf_disable_ipv6(struct ctl_table > *table, int *p, int newf) > > struct net *net; > > int old; > > > > - if (!rtnl_trylock()) > > + if (!rtnl_trylock()){ > > + schedule_timeout_uninterruptible(HZ/4); > > return restart_syscall(); > > + } > > > > net = (struct net *)table->extra2; > > old = *p; Eric