Re: [Beowulf] NFS Read Errors

Michael H. Frese Wed, 05 Dec 2007 09:05:13 -0800

This tale is at an end, I think, because I can't bear to tell it muchlonger. As many have suggested, there is probably a hardwareproblem, and since the hardware is old, I will do without theservices of the troublesome machines -- It turns out that there isanother acting up as well -- till they are replaced in a couple of weeks.


Many thanks to all who racked their brains for helpful suggestions.

I want to tell a little more of what I have learned, before I dropthe subject altogether.

First, I did swap the cable of the bad machine with that of a goodone with no effect on either machine. This eliminates thepossibility of the cable or the switch port being bad. Since I hadpreviously changed out the NIC and the switch, the only possibilty issomething inside the machine itself, probably the motherboard, butpossibly a corrupted kernel module for handling udp -- more on that below.

Second, we could find no sign of this failure in any log. Nor did/proc/net/dev show any errors. The suggestion is that older kernelsaren't going to detect and report such errors. I think that'sbecause they do nfs over udp. More about that in a moment.

Third, though netcat isn't on these systems, nc is. We didn't getaround to trying it, because we found ttcp.

Fourth, with ttcp over tcp, I found that the troubled machine couldsend 800 MB in about 20 seconds -- the wire speed for those 32-bitPCI slots as tested by netpipe. However, if I used ttcp over udp, Icouldn't reliably send even ten 8192-byte blocks! Successive sendsand receives would receive 3, or 1, or 5 blocks. Don't ask me howthese two facts are compatible. I don't know.

Clearly, this puts a premium on using tcp for nfs. All our attemptsto do that failed. Well, both of them, anyway. In the first one, weunmounted the offending disk, modified its fstab entry, and remountedit. We were pretty careful in the second one, where we added tcp tothe fstab argument, unmounted all the remote disks, restarted all thenfsd's, and did 'mount -a'. We got an error message in both casesthat didn't obviously refer to the tcp argument, but the mount didn'thappen. As I write this, I see references to tcp mount requests inthe mountd man page, so maybe we need to do a bit more here.

The Wikipedia article on nfs says this: "At the time of introductionof Version 3, vendor support for TCP as a transport-layer protocolbegan increasing. While several vendors had already added support forNFS Version 2 with TCP as a transport, Sun Microsystems added supportfor TCP as a transport for NFS at the same time it added support forVersion 3."

I'd like to know what version of nfs this server supports, but theman page on nfsd doesn't say. The man page on rpc.mountd says thatit supports nfs version 2 and version 3, but that "If the NFS kernelmodule was compiled without support for NFSv3, rpc.mountd must beinvoked with the option --no-nfs-version 3." Yet the/proc/procnum/cmdline for the running rpc.mountd doesn't show a--no-nfs-version argument. Clearly, both the kernel and the serverneed to support the use of tcp.

I'd like to get any of our other machines with these older kernels atother sites to using tcp for nfs where possible, in order to avoidthis in the future. We are already seeing signs of network problemson them. If that's not possible, then in order to avoid a completerebuild of those systems -- there are 12 of them -- we are going toput a testing script together using remote invocations of md5sum andcomparison of results to recorded local results.


Thanks again!


Mike



At 08:54 AM 12/4/2007, you wrote:

Mark,

Thanks for your helpful comments.

At 11:31 PM 12/3/2007, you wrote:
I am guessing you are using TCP NFS mounts as well? TCP forcesretries in the event of bad packets. UDP doesn't force this, butthe NFS protocol will
UDP has a checksum as well, though it's only 16b.  then again, the TCP
checksum isn't all that strong for today's data rates either.
From reading the man page on nfs on the systems with the 2.4kernels, it looks like the default for an nfs mount is udp. Italso looks like tcp is not really an option until nfs v4, so it maybe something to try on the 2.6 kernels that I have on some of mynewer machines at another site.
you should definitely examine /proc/net/dev on involved machines.
I hadn't known about /proc/net/dev. When I check there, I see notransmit errors on the server side and no receive errors on theclient side. That's odd, because the other thing I see is that theaverage packet size received (bytes received divided by packetsreceived) on the client side is 3.9, while on the server side, theaverage packet size sent is 1430. In other words, there are a manymore packets received than there ought to be. That's veryfishy. It's probably the result of the way the packet count is doneand reported. I.e., it may be that all the received packets -- goodand bad -- are counted, but only the bytes in the good ones arecounted, with some similar problem on the server side. I think thestatistics are aggregate since the last boot, so they may not bejust from the troublesome tests I was performing, either.
I would attempt to reduce the complexity of your testing.
for instance, can a node write and verify to its local disk
without problem?
The local disk read seems rock solid in comparison to the NFSone. The local md5sum produces the same result time after time,which is just not the case for the remote.
can it stream data over tcp sockets (netcat or the like) withoutcorruption or obvious problems reflected
in /proc/net/dev?
netcat is not on my systems. Looks like I have to get someone todownload and build it for me, and try the streaming tests you recommend.
does ethtool tell you anything about the config of the nic?
Not on the 2.4 systems, though it seems to tell me a little on the 2.6's.
comparing tcp vs udp NFS would be sensible
as well - varying the packet size, too. switching client and/orserver to a modern 2.6 kernel may be instructive.
Upgrading the kernel is probably the only way I'll get nfs overtcp. Given that these systems are headed out the door, I'm not surethat's a good use of our time. But it may be worth doing an our newand newer systems.
Thanks again!


Mike

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] NFS Read Errors

Reply via email to