On Thu, Jun 20, 2019 at 09:41:30AM -0400, Willem de Bruijn wrote: > On Wed, Jun 19, 2019 at 4:26 PM Neil Horman <nhor...@tuxdriver.com> wrote: > > > > When an application is run that: > > a) Sets its scheduler to be SCHED_FIFO > > and > > b) Opens a memory mapped AF_PACKET socket, and sends frames with the > > MSG_DONTWAIT flag cleared, its possible for the application to hang > > forever in the kernel. This occurs because when waiting, the code in > > tpacket_snd calls schedule, which under normal circumstances allows > > other tasks to run, including ksoftirqd, which in some cases is > > responsible for freeing the transmitted skb (which in AF_PACKET calls a > > destructor that flips the status bit of the transmitted frame back to > > available, allowing the transmitting task to complete). > > > > However, when the calling application is SCHED_FIFO, its priority is > > such that the schedule call immediately places the task back on the cpu, > > preventing ksoftirqd from freeing the skb, which in turn prevents the > > transmitting task from detecting that the transmission is complete. > > > > We can fix this by converting the schedule call to a completion > > mechanism. By using a completion queue, we force the calling task, when > > it detects there are no more frames to send, to schedule itself off the > > cpu until such time as the last transmitted skb is freed, allowing > > forward progress to be made. > > > > Tested by myself and the reporter, with good results > > > > Appies to the net tree > > > > Signed-off-by: Neil Horman <nhor...@tuxdriver.com> > > Reported-by: Matteo Croce <mcr...@redhat.com> > > CC: "David S. Miller" <da...@davemloft.net> > > --- > > This is a complex change for a narrow configuration. Isn't a > SCHED_FIFO process preempting ksoftirqd a potential problem for other > networking workloads as well? And the right configuration to always > increase ksoftirqd priority when increasing another process's > priority? Also, even when ksoftirqd kicks in, isn't some progress > still made on the local_bh_enable reached from schedule()? >
A few questions here to answer: Regarding other protocols having this problem, thats not the case, because non packet sockets honor the SK_SNDTIMEO option here (i.e. they sleep for a period of time specified by the SNDTIMEO option if MSG_DONTWAIT isn't set. We could certainly do that, but the current implementation doesn't (opting instead to wait indefinately until the respective packet(s) have transmitted or errored out), and I wanted to maintain that behavior. If there is consensus that packet sockets should honor SNDTIMEO, then I can certainly do that. As for progress made by calling local_bh_enable, My read of the code doesn't have the scheduler calling local_bh_enable at all. Instead schedule uses preempt_disable/preempt_enable_no_resched() to gain exlcusive access to the cpu, which ignores pending softirqs on re-enablement. Perhaps that needs to change, but I'm averse to making scheduler changes for this (the aforementioned concern about complex changes for a narrow use case) Regarding raising the priority of ksoftirqd, that could be a solution, but the priority would need to be raised to a high priority SCHED_FIFO parameter, and that gets back to making complex changes for a narrow problem domain As for the comlexity of the of the solution, I think this is, given your comments the least complex and intrusive change to solve the given problem. We need to find a way to force the calling task off the cpu while the asynchronous operations in the transmit path complete, and we can do that this way, or by honoring SK_SNDTIMEO. I'm fine with doing the latter, but I didn't want to alter the current protocol behavior without consensus on that. Regards Neil