NFS locking issues

nate Wed, 04 Sep 2002 08:16:29 -0700

it was just a couple weeks ago I was trying to help others
on NFS and here I a asking something! doh!


anyways, I noticed recently that my NFS server at home seems to
have trouble with locking. I have 2 clients which use it to host
home directories(1 debian woody, 1 suse 8). I first noticed it about
a week ago when trying to load gnp (gnome notepad, my favorite X editor),
it didn't load, it just hung.. and i was getting this in my local(client)
kernel log:
Aug 25 13:56:37 aphro kernel: lockd: task 173568 can't get a request slot
Aug 25 13:57:59 aphro kernel: lockd: task 173597 can't get a request slot
Aug 25 13:58:49 aphro kernel: lockd: task 173597 can't get a request slot
Aug 25 13:59:39 aphro kernel: lockd: task 173597 can't get a request slot
Aug 25 14:00:29 aphro kernel: lockd: task 173597 can't get a request slot
Aug 25 14:01:19 aphro kernel: lockd: task 173597 can't get a request slot

I was getting this in my server kernel log:
lockd: cannot monitor 10.10.10.10
statd: server localhost not responding, timed out
nsm_mon_unmon: rpc failed, status=-5
lockd: cannot monitor 10.10.10.10
statd: server localhost not responding, timed out
nsm_mon_unmon: rpc failed, status=-5


one website said this is the result of an overloaded server, but I
don't think it's overloaded with only 2 clients(usually only 1 of which
are using it at a time since these systems are on the same KVM). I
can usually work around it short term by restarting the NFS services ..
not many apps seem to be affected by it. gnome-terminal works fine, afterstep
is fine, mozilla and opera are fine, staroffice 6 is fine, I can only
assume that they either don't care for locking or do it in another
manor.

I have the NFS server(debian 3.0 / 2.2.19 / using kernel NFS) set
to load 19 NFS servers, it also loads the lockd service(kernel level):

(querying the server from the client):
[root@aphro:~]# rpcinfo -p gateway
   program vers proto   port
    100000    2   tcp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp  19662  status
    100024    1   tcp   7617  status
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100021    1   udp  19663  nlockmgr
    100021    3   udp  19663  nlockmgr
    100021    4   udp  19663  nlockmgr
    100005    1   udp  19664  mountd
    100005    1   tcp   7618  mountd
    100005    2   udp  19664  mountd
    100005    2   tcp   7618  mountd
    100005    3   udp  19664  mountd
    100005    3   tcp   7618  mountd

(querying the client from the client):
[root@aphro:~]# rpcinfo -p
   program vers proto   port
    100000    2   tcp    111  portmapper
    100000    2   udp    111  portmapper
    100021    1   udp   1024  nlockmgr
    100021    3   udp   1024  nlockmgr
    100021    4   udp   1024  nlockmgr
    100024    1   udp   1025  status
    100024    1   tcp   1025  status
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100005    1   udp   1026  mountd
    100005    1   tcp   1026  mountd
    100005    2   udp   1026  mountd
    100005    2   tcp   1026  mountd
    100005    3   udp   1026  mountd
    100005    3   tcp   1026  mountd

running nfsstat on the server shows the following results:
Server rpc stats:
calls      badcalls   badauth    badclnt    xdrcall
11900099   1420       0          1420       0

Server nfs v3:
null       getattr    setattr    lookup     access     readlink
15      0% 7292735 61% 171766  1% 625793  5% 1426891 11% 389     0%
read       write      create     mkdir      symlink    mknod
830197  6% 1053611  8% 150175  1% 2889    0% 979     0% 3       0%
remove     rmdir      rename     link       readdir    readdirplus
132602  1% 3179    0% 1195    0% 333     0% 18594   0% 2901    0%
fsstat     fsinfo     pathconf   commit
395     0% 305     0% 0       0% 185152  1%


(I have the clients mounting the filesystem with the option nfsvers=3)

my next thing to try is to switch to nfsvers=2 and see if it helps
at all. (all other stats reported by nfsstat are 0)

all 3 machines are on the same VLAN of my Summit 48-port switch, with
a 17gig backplane I am certain there is no bandwidth issues. one website
reccomended doing a ping -f to the server/client and see if there
is packet loss, I did it anyways just to see the results:

server to client:
--- aphro.aphroland.org ping statistics ---
60496 packets transmitted, 60494 packets received, 0% packet loss
round-trip min/avg/max = 0.1/0.1/3.4 ms


client to server:
--- gateway.aphroland.org ping statistics ---
78989 packets transmitted, 78983 packets received, 0% packet loss
round-trip min/avg/max = 0.1/0.2/44.0 ms


server is:
P3-800
1GB ram
dual western digitial 100GB Special edition(8MB cache each) drives in raid1
2.2.19 kernel

client1 is:
Athlon 1300
768MB ram
9.1GB ultrawide SCSI disk
2.2.19 kernel

client2 is:
P3-500
512MB ram
12GB IBM IDE disk
2.4.18 kernel

one thing that is curious, is I ran an lsof to see the open ports used
by rpc.statd, it is using 2 at the moment, one of which is 7617/udp. I
ran a UDP nmap scan against localhost and nmap reported that port was
closed. I ran a nmap scan against that same port from my client and it
reported the port open. my firewalling rules only affect the eth0 interface,
so I am not sure why statd stops responding to localhost connecitons
which seems to be the heart of the problem ?

my rpc firewall rules:
PORTS="`rpcinfo -p | awk '{print $4}' | grep '[0-9]'`"

for rpcport in $PORTS
do
/sbin/ipchains -A input -s 0/0 -d 0/0 $rpcport -j REJECT -p tcp -i eth0
/sbin/ipchains -A input -s 0/0 -d 0/0 $rpcport -j REJECT -p udp -i eth0
done


the 2nd port that rpc.statd is listening on(807/UDP) is reported to
be open by a UDP nmap scan against localhost on the server.

[root@portal:/etc/init.d]# nmap -sU -vv -p 807,7617 localhost

Starting nmap V. 2.54BETA31 ( www.insecure.org/nmap/ )
Host debian (127.0.0.1) appears to be up ... good.
Initiating UDP Scan against debian (127.0.0.1)
The UDP Scan took 2 seconds to scan 2 ports.
Adding open port 807/udp
Interesting ports on debian (127.0.0.1):
(The 1 port scanned but not shown below is in state: closed)
Port       State       Service
807/udp    open        unknown


Nmap run completed -- 1 IP address (1 host up) scanned in 2 seconds


thanks for any ideas!

nate




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

NFS locking issues

Reply via email to