Copied netdev as that is the more appropriate mailling list for
questions like this.

Xianpeng Zhao <xpz...@aerohive.com> writes:

> Hi Group,
>
>          I have find a problem in my system, I found there have a chance that 
> cause the system enter dead loop when try to get the rtnl lock in the sysctl 
> function in net/ipv6/addrconf.c
>
>          The situation should like this, there are 2 processes may need get 
> the rtnl lock, we call them process A and process B, A have high priority 
> than B.
> B need get the rtnl lock to do something, when B schedule out without release 
> the lock; At this time, the A start to run "echo 1 > 
> /proc/sys/net/ipv6/conf/<ifname>/disable_ipv6", the echo process will run to 
> this code:
>
>     if (!rtnl_trylock())
>
>         return restart_syscall();
>
> Because the rtnl lock was hold by process B, so here the try will be failure, 
> and run the restart_syscall to let the sys_write do again, even try many 
> times, because the B have very lower priority, the lock was hard to be 
> released, so the echo process created by A will enter a loop of restart 
> system call.
>
> In my case it is the wireless_nlevent_process in process kworker taken the 
> rtnl lock, and another higher priority process need use echo to disable IPv6 
> met this problem.
>
> I am not very sure, but I think it is better to let the process A sleep a 
> while instead of try it again and again without any delay.
>
> Expects, what's your opinions?

That the entire situation is a mess.  From what little I have seen it is
a very rare condition.  Does this reproduce easily in your environment?

If we are going the delay route we probably want to put the delay in
restart_syscall or in a wrapper around restart_syscall that we use
for the rtnl_trylock failure case.

On first blush I would suggest the logic for sleeping should be:
        if (need_reschedule())
                schedule();

That will limit the spinning to a single time slice which is definitely
preferrable.

Ugh.  But we already cross the kernel/userspace boundary that already
does that.

If you are encountering a deadlock it is very much because you have been
playing very ugly priority games.  At which point my sympathies but this
feels like a case of "Docter it hurts when I do this.  Then don't do that."

> @@ -5304,8 +5308,10 @@ static int addrconf_disable_ipv6(struct ctl_table 
> *table, int *p, int newf)
>
>         struct net *net;
>
>         int old;
>
>  
>
> -       if (!rtnl_trylock())
>
> +       if (!rtnl_trylock()){
>
> +        schedule_timeout_uninterruptible(HZ/4);
>
>                 return restart_syscall();
>
> +    }
>
>  
>
>         net = (struct net *)table->extra2;
>
>         old = *p;

Eric

Reply via email to