On Wed, May 20, 2020 at 05:51:27AM -0700, Eric Dumazet wrote: > > On 5/19/20 11:42 PM, Ahmed S. Darwish wrote: > > Hello Eric, > > > > On Tue, May 19, 2020 at 07:01:38PM -0700, Eric Dumazet wrote: > >> > >> On 5/19/20 2:45 PM, Ahmed S. Darwish wrote: > >>> Sequence counters write paths are critical sections that must never be > >>> preempted, and blocking, even for CONFIG_PREEMPTION=n, is not allowed. > >>> > >>> Commit 5dbe7c178d3f ("net: fix kernel deadlock with interface rename and > >>> netdev name retrieval.") handled a deadlock, observed with > >>> CONFIG_PREEMPTION=n, where the devnet_rename seqcount read side was > >>> infinitely spinning: it got scheduled after the seqcount write side > >>> blocked inside its own critical section. > >>> > >>> To fix that deadlock, among other issues, the commit added a > >>> cond_resched() inside the read side section. While this will get the > >>> non-preemptible kernel eventually unstuck, the seqcount reader is fully > >>> exhausting its slice just spinning -- until TIF_NEED_RESCHED is set. > >>> > >>> The fix is also still broken: if the seqcount reader belongs to a > >>> real-time scheduling policy, it can spin forever and the kernel will > >>> livelock. > >>> > >>> Disabling preemption over the seqcount write side critical section will > >>> not work: inside it are a number of GFP_KERNEL allocations and mutex > >>> locking through the drivers/base/ :: device_rename() call chain. > >>> > >>> From all the above, replace the seqcount with a rwsem. > >>> > >>> Fixes: 5dbe7c178d3f (net: fix kernel deadlock with interface rename and > >>> netdev name retrieval.) > >>> Fixes: 30e6c9fa93cf (net: devnet_rename_seq should be a seqcount) > >>> Fixes: c91f6df2db49 (sockopt: Change getsockopt() of SO_BINDTODEVICE to > >>> return an interface name) > >>> Cc: <sta...@vger.kernel.org> > >>> Signed-off-by: Ahmed S. Darwish <a.darw...@linutronix.de> > >>> Reviewed-by: Sebastian Andrzej Siewior <bige...@linutronix.de> > >>> --- > >>> net/core/dev.c | 30 ++++++++++++------------------ > >>> 1 file changed, 12 insertions(+), 18 deletions(-) > >>> > >> > >> Seems fine to me, assuming rwsem prevent starvation of the writer. > >> > > > > Thanks for the review. > > > > AFAIK, due to 5cfd92e12e13 ("locking/rwsem: Adaptive disabling of reader > > optimistic spinning"), using a rwsem shouldn't lead to writer starvation > > in the contended case. > > Hmm this was in linux-5.3, so very recent stuff. > > Has this patch been backported to stable releases ? > > With all the Fixes: tags you added, stable teams will backport this > networking patch to all stable versions. > > Do we have a way to tune a dedicare rwsem to 'give preference to the > (unique in this case) writer" over a myriad of potential readers ? >
I was wrong in referencing the commit 5cfd92e12e13 above. Before and after that commit, once a rwsem writer is blocking, all subsequent readers will block until that writer makes progress. Given that behavior, and that the read section is already quite short, I don't think there's any danger incurred on writers here. (a v2 will be sent shortly, fixing the error found Dan/kbuild-bot.) Thanks, -- Ahmed S. Darwish Linutronix GmbH