it was just a couple weeks ago I was trying to help others on NFS and here I a asking something! doh!
anyways, I noticed recently that my NFS server at home seems to have trouble with locking. I have 2 clients which use it to host home directories(1 debian woody, 1 suse 8). I first noticed it about a week ago when trying to load gnp (gnome notepad, my favorite X editor), it didn't load, it just hung.. and i was getting this in my local(client) kernel log: Aug 25 13:56:37 aphro kernel: lockd: task 173568 can't get a request slot Aug 25 13:57:59 aphro kernel: lockd: task 173597 can't get a request slot Aug 25 13:58:49 aphro kernel: lockd: task 173597 can't get a request slot Aug 25 13:59:39 aphro kernel: lockd: task 173597 can't get a request slot Aug 25 14:00:29 aphro kernel: lockd: task 173597 can't get a request slot Aug 25 14:01:19 aphro kernel: lockd: task 173597 can't get a request slot I was getting this in my server kernel log: lockd: cannot monitor 10.10.10.10 statd: server localhost not responding, timed out nsm_mon_unmon: rpc failed, status=-5 lockd: cannot monitor 10.10.10.10 statd: server localhost not responding, timed out nsm_mon_unmon: rpc failed, status=-5 one website said this is the result of an overloaded server, but I don't think it's overloaded with only 2 clients(usually only 1 of which are using it at a time since these systems are on the same KVM). I can usually work around it short term by restarting the NFS services .. not many apps seem to be affected by it. gnome-terminal works fine, afterstep is fine, mozilla and opera are fine, staroffice 6 is fine, I can only assume that they either don't care for locking or do it in another manor. I have the NFS server(debian 3.0 / 2.2.19 / using kernel NFS) set to load 19 NFS servers, it also loads the lockd service(kernel level): (querying the server from the client): [root@aphro:~]# rpcinfo -p gateway program vers proto port 100000 2 tcp 111 portmapper 100000 2 udp 111 portmapper 100024 1 udp 19662 status 100024 1 tcp 7617 status 100003 2 udp 2049 nfs 100003 3 udp 2049 nfs 100021 1 udp 19663 nlockmgr 100021 3 udp 19663 nlockmgr 100021 4 udp 19663 nlockmgr 100005 1 udp 19664 mountd 100005 1 tcp 7618 mountd 100005 2 udp 19664 mountd 100005 2 tcp 7618 mountd 100005 3 udp 19664 mountd 100005 3 tcp 7618 mountd (querying the client from the client): [root@aphro:~]# rpcinfo -p program vers proto port 100000 2 tcp 111 portmapper 100000 2 udp 111 portmapper 100021 1 udp 1024 nlockmgr 100021 3 udp 1024 nlockmgr 100021 4 udp 1024 nlockmgr 100024 1 udp 1025 status 100024 1 tcp 1025 status 100003 2 udp 2049 nfs 100003 3 udp 2049 nfs 100005 1 udp 1026 mountd 100005 1 tcp 1026 mountd 100005 2 udp 1026 mountd 100005 2 tcp 1026 mountd 100005 3 udp 1026 mountd 100005 3 tcp 1026 mountd running nfsstat on the server shows the following results: Server rpc stats: calls badcalls badauth badclnt xdrcall 11900099 1420 0 1420 0 Server nfs v3: null getattr setattr lookup access readlink 15 0% 7292735 61% 171766 1% 625793 5% 1426891 11% 389 0% read write create mkdir symlink mknod 830197 6% 1053611 8% 150175 1% 2889 0% 979 0% 3 0% remove rmdir rename link readdir readdirplus 132602 1% 3179 0% 1195 0% 333 0% 18594 0% 2901 0% fsstat fsinfo pathconf commit 395 0% 305 0% 0 0% 185152 1% (I have the clients mounting the filesystem with the option nfsvers=3) my next thing to try is to switch to nfsvers=2 and see if it helps at all. (all other stats reported by nfsstat are 0) all 3 machines are on the same VLAN of my Summit 48-port switch, with a 17gig backplane I am certain there is no bandwidth issues. one website reccomended doing a ping -f to the server/client and see if there is packet loss, I did it anyways just to see the results: server to client: --- aphro.aphroland.org ping statistics --- 60496 packets transmitted, 60494 packets received, 0% packet loss round-trip min/avg/max = 0.1/0.1/3.4 ms client to server: --- gateway.aphroland.org ping statistics --- 78989 packets transmitted, 78983 packets received, 0% packet loss round-trip min/avg/max = 0.1/0.2/44.0 ms server is: P3-800 1GB ram dual western digitial 100GB Special edition(8MB cache each) drives in raid1 2.2.19 kernel client1 is: Athlon 1300 768MB ram 9.1GB ultrawide SCSI disk 2.2.19 kernel client2 is: P3-500 512MB ram 12GB IBM IDE disk 2.4.18 kernel one thing that is curious, is I ran an lsof to see the open ports used by rpc.statd, it is using 2 at the moment, one of which is 7617/udp. I ran a UDP nmap scan against localhost and nmap reported that port was closed. I ran a nmap scan against that same port from my client and it reported the port open. my firewalling rules only affect the eth0 interface, so I am not sure why statd stops responding to localhost connecitons which seems to be the heart of the problem ? my rpc firewall rules: PORTS="`rpcinfo -p | awk '{print $4}' | grep '[0-9]'`" for rpcport in $PORTS do /sbin/ipchains -A input -s 0/0 -d 0/0 $rpcport -j REJECT -p tcp -i eth0 /sbin/ipchains -A input -s 0/0 -d 0/0 $rpcport -j REJECT -p udp -i eth0 done the 2nd port that rpc.statd is listening on(807/UDP) is reported to be open by a UDP nmap scan against localhost on the server. [root@portal:/etc/init.d]# nmap -sU -vv -p 807,7617 localhost Starting nmap V. 2.54BETA31 ( www.insecure.org/nmap/ ) Host debian (127.0.0.1) appears to be up ... good. Initiating UDP Scan against debian (127.0.0.1) The UDP Scan took 2 seconds to scan 2 ports. Adding open port 807/udp Interesting ports on debian (127.0.0.1): (The 1 port scanned but not shown below is in state: closed) Port State Service 807/udp open unknown Nmap run completed -- 1 IP address (1 host up) scanned in 2 seconds thanks for any ideas! nate -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]