On Sun, Jun 19, 2022 at 11:05:38AM +0200, Jeremie Courreges-Anglas wrote: > On Fri, Jun 17 2022, Jeremie Courreges-Anglas <j...@wxcvbn.org> wrote: > > On Thu, Jun 16 2022, Visa Hankala <v...@hankala.org> wrote: > >> nfs_inactive() has a lock order reversal. When it removes the silly > >> file, it locks the directory vnode while it already holds the lock > >> of the argument file vnode. This clashes for example with name lookups > >> where directory vnodes are locked before file vnodes. > >> > >> The reversal can cause a deadlock when an NFS client has multiple > >> processes that create, modify and remove files in the same > >> NFS directory. > >> > >> The following patch makes the silly file removal happen after > >> nfs_inactive() has released the file vnode lock. This should be safe > >> because the silly file removal is independent of nfs_inactive()'s > >> argument vnode. > > The diff makes sense to me. Did you spot it reviewing the code, or > using WITNESS?
I noticed it by code review. WITNESS is somewhat helpless with vnode locks because they can involve multiple levels of lock nesting. In fact, the order checking between vnodes has been disabled by initializing the locks with RWL_IS_VNODE. To fix this, the kernel would have to pass nesting information around the filesystem code. This particular deadlock can be triggered for example by quickly writing and removing temporary files in an NFS directory using one process while another process lists the directory contents repeatedly. > >> Could some NFS users test this? > > > > I'm running this diff on the riscv64 build cluster, since 1h25mn with no > > hang. Let's see how it goes. > > This run did finish properly yesterday. > > > This cluster doesn't use NFS as much as it could (build logs stored > > locally) but I can try that in the next build. > > So I have restarted a build with this diff and dpb(1) logging on an > NFS-mounted /usr/ports/logs. I get a dpb(1) hang after 1400+ packages > built. Any other attempt to access the NFS-mounted filesystem results > in a hang. Let me know if I can extract more data from the system. No need this time. Those wait messages give some pointers. > shannon ~$ grep nfs riscv64/nfs-hang.txt > 97293 72036 49335 0 3 0x91 nfs_fsync perl > 69045 83700 64026 55 3 0x82 nfs_fsync c++ > 80365 37354 15104 55 3 0x100082 nfs_fsync make > 28876 139812 59322 55 3 0x100082 nfs_fsync make > 6160 193238 61541 1000 3 0x100003 nfsnode ksh > 7535 421732 0 0 3 0x14280 nfsrcvlk nfsio > 70437 237308 0 0 3 0x14280 nfsrcvlk nfsio > 97073 406345 0 0 3 0x14200 nfsrcvlk nfsio > 88487 390804 0 0 3 0x14200 nfsrcvlk nfsio > 58945 91139 92962 0 3 0x80 nfsd nfsd > 75619 357314 92962 0 3 0x80 nfsd nfsd > 39027 137228 92962 0 3 0x80 nfsd nfsd > 22028 406380 92962 0 3 0x80 nfsd nfsd > 92962 11420 1 0 3 0x80 netcon nfsd > 90467 310188 0 0 3 0x14280 nfsrcvlk update