Package: nfs-kernel-server
Version: 1:1.1.4-1
Severity: normal


Description of problem:
Periodically, and with no obvious cause, all NFS connections between our
Debian Testing (_Squeeze_) x86 client (a diskless node which uses nfsroot
and boots from the server) and our Debian Testing (_Squeeze_) x86 server
hang ang and dmesg on the client side informs that the server is "not
responding".

The server is responding to everyone else's requests. 

Restarting the nfsd on the server doesn't appear to solve the problem.

At first I wasnt able to capture some debug information since /var/log was 
mounted over the nfs, so I have installed a hard drive where I mounted 
only /var/log to be able to capture debug logs from the client as well.


Debug Logs: 
http://fixity.net/tmp/client.log.gz - Kernel RPC Debug Log from the client
http://fixity.net/tmp/server.log.gz - Kernel RPC Debug Log from the server


How reproducible:
Happens from 10 to 90 minutes after booting the diskless node.


Actual results:
NFS connections stop responding, system hangs or becomes very slow and 
unresponsive (it doesnt respond to Ctrl+Alt+Del as well). 60 to 90 minutes 
after the first server time out client says server OK but the client is
still unresponsive. Immediately after that the client logs server connection
loss again which leads to continues loop. Client is still unresponsive.
Sometimes client resumes normal operation for couple of hours but then the
problem repeats.


Connectivity info: 
Both the client and the server are connected to Gigabit Ethernet Cisco Metro 
series managable switch. Both of them use Intel Pro 82545GM Gigabit Ethernet 
Server Controllers. Neither one of them log any Ethernet errors and none are 
logged by the switch.


Expected results:
NFS connections continue to function and don't fail like clockwork when
every other client on the network has no issues.


Client & Server Load:
For the purposes of testing both machines were only running needed daemons
and weren't loaded at all.


Client & Server Kernel:
On both the client and server custom compiled linux 2.6.29.3 kernel was used. 
Configuration file @ http://fixity.net/tmp/config-2.6.29.3.gz


Client & Server Network interface fragmented packet queue length:
net.ipv4.ipfrag_high_thresh = 524288
net.ipv4.ipfrag_low_thresh = 393216


Client Versions:
libnfsidmap2/squeeze uptodate 0.21-2
nfs-common/squeeze uptodate 1:1.1.4-1


Client Mount (cat /proc/mounts | grep nfsroot):
10.11.11.1:/nfsroot / nfs rw,vers=3,rsize=524288,wsize=524288,namlen=255,
hard,nointr,nolock,proto=tcp,timeo=7,retrans=10,sec=sys,addr=10.11.11.1 0 0


Client fstab:
proc            /proc           proc    defaults        0       0
/dev/nfs        /               nfs     defaults        1       1
none            /tmp            tmpfs   defaults        0       0
none            /var/run        tmpfs   defaults        0       0
none            /var/lock       tmpfs   defaults        0       0
none            /var/tmp        tmpfs   defaults        0       0

Client Daemons:
portmap, rpc.statd, rpc.idmapd

Server Daemons:
portmap, rpc.statd, rpc.idmapd, rpc.mountd --manage-gids

Server Versions:
libnfsidmap2/squeeze uptodate 0.21-2
nfs-common/squeeze uptodate 1:1.1.4-1
nfs-kernel-server/testing uptodate 1:1.1.4-1

Server Export:
/nfsroot 10.11.11.*(rw,no_root_squash,async,no_subtree_check)

Server Options:
RPCNFSDCOUNT=16
RPCNFSDPRIORITY=0
RPCMOUNTDOPTS=--manage-gids
NEED_SVCGSSD=no
RPCSVCGSSDOPTS=no

Additional Info:
Since I have read that tweaking the nfsroot mount options could improve the 
situation a have tested with different options as follows:
rsize/wsize=1024|2048|4096|8192|32768|524288
timeo=15|60|600
retrans=3|10|20
None resulted in solving the problem.

Any help or suggestions on fixing the problem would be highly appreciated. I 
have been messing with that problem for the last couple of weeks and ran out
of ideas.



Best Regards,
Jerome Walters



-- System Information:
Debian Release: squeeze/sid
  APT prefers old-stable
  APT policy: (500, 'old-stable'), (500, 'testing')
Architecture: i386 (i686)

Kernel: Linux 2.6.29.3 (SMP w/2 CPU cores)
Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1)
Shell: /bin/sh linked to /bin/bash

Versions of packages nfs-kernel-server depends on:
ii  libblkid1            1.41.3-1            block device id library
ii  libc6                2.9-4               GNU C Library: Shared libraries
ii  libcomerr2           1.41.3-1            common error description library
ii  libgssglue1          0.1-2               mechanism-switch gssapi library
ii  libkrb53             1.6.dfsg.4~beta1-13 Transitional library package/krb4 
ii  libnfsidmap2         0.21-2              An nfs idmapping library
ii  librpcsecgss3        0.18-1              allows secure rpc communication us
ii  libwrap0             7.6.q-16            Wietse Venema's TCP wrappers libra
ii  lsb-base             3.2-22              Linux Standard Base 3.2 init scrip
ii  nfs-common           1:1.1.4-1           NFS support files common to client
ii  ucf                  3.0018              Update Configuration File: preserv

nfs-kernel-server recommends no packages.

nfs-kernel-server suggests no packages.

-- no debconf information



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to