On Mon, 16 Jul 2018 09:31:06 +0200 André Pribil <a.pri...@beck-ipc.com> wrote:
> Hello, > > I'm using kernel 4.14.52-rt34 on a single core ARM system and I'm seeing a > deadlock inside the kernel when two RT processes make calls in the right > temporal distance. The first process is trying to bring the Ethernet > interface > up, with the SIOCGIFFLAGS ioctl(). The second process is checking the > Ethernet > carrier, speed and duplex status, by reading e.g. "/sys/class/net/eth1/speed". > > The first process finally gets to phy_poll_reset() in > drivers/net/phy/phy_device.c, where it calls msleep(50). > It never returns from the sleep. > > The second process gets to speed_show() in net/core/net-sysfs.c. It tries to > get > the RTNL lock with rtnl_trylock(), but fails and calls restart_syscall(). > This happens over and over again. > > It seems like the first process in no longer scheduled and cannot release the > RTNL lock, while the second process is busy restarting the syscall. The first > process has a higher RT priority than the second process. > > Just for testing I've added the TIF_NEED_RESCHED flag to the > restart_syscall() > function and I did not see the deadlock again with this change. > > static inline int restart_syscall(void) > { > set_tsk_thread_flag(current, TIF_SIGPENDING | TIF_NEED_RESCHED); > return -ERESTARTNOINTR; > } > > As a second test I released the RTNL lock while calling msleep() in > phy_poll_reset(). This also made the problem disappear. > > I've found this thread, where a similar issue with restart_syscall() has been > reported: > https://www.spinics.net/lists/netdev/msg415144.html > > Any ideas how to fix this issue? > > Andre Don't do control operations from RT processes! There can be cases of priority inversion where RT process is waiting for something that requires a kthread to complete the operation.