I do this for NFSv3 and NFSv4, but all my underlying filesystems are ZFS and that was what prompted me to being setting fsid initially. It may be irrelevant for NFSv3 and/or non-ZFS filesystems.
jbh On Wed, Apr 19, 2017 at 9:13 PM Prentice Bisbal <pbis...@pppl.gov> wrote: > Even with NFSv3? It seems like fsid=0 is required for NFSv4, but does it > have any impact on NFSv3? I honestly am not an expert of the details of > NFS. For me, it's always "just worked", and performance was never an > issue, so I never had much reason to dig into the details of > tweaking/debugging/optimizing NFS. > > Prentice > > On 04/19/2017 02:07 PM, John Hanks wrote: > > I've had far fewer unexplained (although admittedly there was a limited > search for the guilty) NFS issues since I started using fsid= in my NFS > exports. If you aren't setting that it might be worth a try. NFS seems to > be much better at recovering from problems with an fsid assigned to the > root of exports. > > jbh > > On Wed, Apr 19, 2017 at 8:58 PM Prentice Bisbal <pbis...@pppl.gov> wrote: > >> Here's the sequence of events: >> >> 1. First job(s) run fine on the node and complete without error. >> >> 2. Eventually a job fails with a 'permission denied' error when it tries >> to access /l/hostname. >> >> Since no jobs fail with a file I/O error, it's hard to confirm that the >> jobs themselves are causing the problem. However, if these particular >> jobs are the only thing running on the cluster and should be the only >> jobs accessing these NFS shares, what else could be causing them. >> >> All these systems are getting their user information from LDAP. Since >> some jobs run before these errors appear, lack of, or inaccurate user >> info doesn't seem to be a likely source of this problem, but I'm not >> ruling anything out at this point. >> >> Important detail: This is NFSv3. >> >> Prentice Bisbal >> Lead Software Engineer >> Princeton Plasma Physics Laboratory >> http://www.pppl.gov >> >> On 04/19/2017 12:20 PM, Ryan Novosielski wrote: >> > Are you saying they can’t mount the filesystem, or they can’t write to >> a mounted filesystem? Where does this system get its user information from, >> if the latter? >> > >> > -- >> > ____ >> > || \\UTGERS, >> |---------------------------*O*--------------------------- >> > ||_// the State | Ryan Novosielski - >> novos...@rutgers.edu >> > || \\ University | Sr. Technologist - 973/972.0922 >> <%28973%29%20972-0922> (2x0922) ~*~ RBHS Campus >> > || \\ of NJ | Office of Advanced Research Computing - MSB >> C630, Newark >> > `' >> > >> >> On Apr 19, 2017, at 12:09, Prentice Bisbal <pbis...@pppl.gov> wrote: >> >> >> >> Beowulfers, >> >> >> >> I've been trying to troubleshoot a problem for the past two weeks with >> no luck. We have a cluster here that runs only one application (although >> the details of that application change significantly from run-to-run.). >> Each node in the cluster has an NFS export, /local, that can be automounted >> by every other node in the cluster as /l/hostname. >> >> >> >> Starting about two weeks ago, when jobs would try to access >> /l/hostname, they would get permission denied messages. I tried analyzing >> this problem by turning on all NFS/RPC logging with rpcdebug and also using >> tcpdump while trying to manually mount one of the remote systems. Both >> approaches indicated state file handles were prevent the share from being >> mounted. >> >> >> >> Since it has been 6-8 weeks since there were any seemingly relevant >> system config changes, I suspect it's an application problem (naturally). >> On the other hand, the application developers/users insist that they >> haven't made any changes, to their code, either. To be honest, there's no >> significant evidence indicating either is at fault. Any suggestions on how >> to debug this and definitively find the root cause of these stale file >> handles? >> >> >> >> -- >> >> Prentice >> >> _______________________________________________ >> >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin >> Computing >> >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > -- > ‘[A] talent for following the ways of yesterday, is not sufficient to > improve the world of today.’ > - King Wu-Ling, ruler of the Zhao state in northern China, 307 BC > > > -- ‘[A] talent for following the ways of yesterday, is not sufficient to improve the world of today.’ - King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf