On Mon, 16 Jul 2018 09:31:06 +0200
André Pribil <a.pri...@beck-ipc.com> wrote:

> Hello,
> 
> I'm using kernel 4.14.52-rt34 on a single core ARM system and I'm seeing a 
> deadlock inside the kernel when two RT processes make calls in the right 
> temporal distance. The first process is trying to bring the Ethernet 
> interface 
> up, with the SIOCGIFFLAGS ioctl(). The second process is checking the 
> Ethernet 
> carrier, speed and duplex status, by reading e.g. "/sys/class/net/eth1/speed".
> 
> The first process finally gets to phy_poll_reset() in 
> drivers/net/phy/phy_device.c, where it calls msleep(50). 
> It never returns from the sleep.
> 
> The second process gets to speed_show() in net/core/net-sysfs.c. It tries to 
> get
> the RTNL lock with rtnl_trylock(), but fails and calls restart_syscall(). 
> This happens over and over again.
> 
> It seems like the first process in no longer scheduled and cannot release the
> RTNL lock, while the second process is busy restarting the syscall. The first 
> process has a higher RT priority than the second process.
>                                                          
> Just for testing I've added the TIF_NEED_RESCHED flag to the 
> restart_syscall() 
> function and I did not see the deadlock again with this change.
> 
> static inline int restart_syscall(void)
> {
>       set_tsk_thread_flag(current, TIF_SIGPENDING | TIF_NEED_RESCHED);
>       return -ERESTARTNOINTR;
> }
> 
> As a second test I released the RTNL lock while calling msleep() in 
> phy_poll_reset(). This also made the problem disappear.
> 
> I've found this thread, where a similar issue with restart_syscall() has been 
> reported:
> https://www.spinics.net/lists/netdev/msg415144.html
> 
> Any ideas how to fix this issue?
> 
> Andre   

Don't do control operations from RT processes!
There can be cases of priority inversion where RT process is waiting for
something that requires a kthread to complete the operation.

Reply via email to