On 3 January 2017 at 20:08, Ted Unangst <t...@tedunangst.com> wrote: > Martin Pieuchot wrote: >> It seems that most of the problems exposed by the introduction of the >> NET_LOCK() are related to the non-recursive nature of the rwlock. Some >> known issues involve pflow(4), cloning interfaces an NFS. >> >> Diff below makes use of a recursive-rwlock instead. I just finished a >> build on NFS with it, so I'd like to know if it works for your use case. > > I don't want to interfere with progress, so if you think this is best, carry > on. > > But I'll note that using recursive locks leads to sloppy locking discipline, > which is exactly how we find ourselves in this mess today. Need the lock? Not > sure? Grab it anyway. > > The rrwlock was added for the particular use case of vfs locking, which is > already a recursive cluster fuck, but wasn't supposed to be used for new code. > Of course, the network stack also qualifies as legacy code, so can't object > much to using it there either. > > I imagine you've already thought of this. :) >
The big problem with the non-recursive netlock is that to fix recursions in many cases we have to release the netlock in one of the callers up the chain which makes the interim code run without any protection and we don't have a way of assessing the situation short of code inspection which is fallible. Sprinkling NET_ASSERT_LOCKED() everywhere is possible, however we still end up with NET_LOCK/UNLOCK chains in the code that was previously running uninterrupted. Does this outweigh the desire not to use recursive rwlocks anywhere else?