This tale is at an end, I think, because I can't bear to tell it much longer. As many have suggested, there is probably a hardware problem, and since the hardware is old, I will do without the services of the troublesome machines -- It turns out that there is another acting up as well -- till they are replaced in a couple of weeks.

Many thanks to all who racked their brains for helpful suggestions.

I want to tell a little more of what I have learned, before I drop the subject altogether.

First, I did swap the cable of the bad machine with that of a good one with no effect on either machine. This eliminates the possibility of the cable or the switch port being bad. Since I had previously changed out the NIC and the switch, the only possibilty is something inside the machine itself, probably the motherboard, but possibly a corrupted kernel module for handling udp -- more on that below.

Second, we could find no sign of this failure in any log. Nor did /proc/net/dev show any errors. The suggestion is that older kernels aren't going to detect and report such errors. I think that's because they do nfs over udp. More about that in a moment.

Third, though netcat isn't on these systems, nc is. We didn't get around to trying it, because we found ttcp.

Fourth, with ttcp over tcp, I found that the troubled machine could send 800 MB in about 20 seconds -- the wire speed for those 32-bit PCI slots as tested by netpipe. However, if I used ttcp over udp, I couldn't reliably send even ten 8192-byte blocks! Successive sends and receives would receive 3, or 1, or 5 blocks. Don't ask me how these two facts are compatible. I don't know.

Clearly, this puts a premium on using tcp for nfs. All our attempts to do that failed. Well, both of them, anyway. In the first one, we unmounted the offending disk, modified its fstab entry, and remounted it. We were pretty careful in the second one, where we added tcp to the fstab argument, unmounted all the remote disks, restarted all the nfsd's, and did 'mount -a'. We got an error message in both cases that didn't obviously refer to the tcp argument, but the mount didn't happen. As I write this, I see references to tcp mount requests in the mountd man page, so maybe we need to do a bit more here.

The Wikipedia article on nfs says this: "At the time of introduction of Version 3, vendor support for TCP as a transport-layer protocol began increasing. While several vendors had already added support for NFS Version 2 with TCP as a transport, Sun Microsystems added support for TCP as a transport for NFS at the same time it added support for Version 3."

I'd like to know what version of nfs this server supports, but the man page on nfsd doesn't say. The man page on rpc.mountd says that it supports nfs version 2 and version 3, but that "If the NFS kernel module was compiled without support for NFSv3, rpc.mountd must be invoked with the option --no-nfs-version 3." Yet the /proc/procnum/cmdline for the running rpc.mountd doesn't show a --no-nfs-version argument. Clearly, both the kernel and the server need to support the use of tcp.

I'd like to get any of our other machines with these older kernels at other sites to using tcp for nfs where possible, in order to avoid this in the future. We are already seeing signs of network problems on them. If that's not possible, then in order to avoid a complete rebuild of those systems -- there are 12 of them -- we are going to put a testing script together using remote invocations of md5sum and comparison of results to recorded local results.

Thanks again!


Mike



At 08:54 AM 12/4/2007, you wrote:
Mark,

Thanks for your helpful comments.

At 11:31 PM 12/3/2007, you wrote:
I am guessing you are using TCP NFS mounts as well? TCP forces retries in the event of bad packets. UDP doesn't force this, but the NFS protocol will

UDP has a checksum as well, though it's only 16b.  then again, the TCP
checksum isn't all that strong for today's data rates either.

From reading the man page on nfs on the systems with the 2.4 kernels, it looks like the default for an nfs mount is udp. It also looks like tcp is not really an option until nfs v4, so it may be something to try on the 2.6 kernels that I have on some of my newer machines at another site.

you should definitely examine /proc/net/dev on involved machines.

I hadn't known about /proc/net/dev. When I check there, I see no transmit errors on the server side and no receive errors on the client side. That's odd, because the other thing I see is that the average packet size received (bytes received divided by packets received) on the client side is 3.9, while on the server side, the average packet size sent is 1430. In other words, there are a many more packets received than there ought to be. That's very fishy. It's probably the result of the way the packet count is done and reported. I.e., it may be that all the received packets -- good and bad -- are counted, but only the bytes in the good ones are counted, with some similar problem on the server side. I think the statistics are aggregate since the last boot, so they may not be just from the troublesome tests I was performing, either.

I would attempt to reduce the complexity of your testing.
for instance, can a node write and verify to its local disk
without problem?

The local disk read seems rock solid in comparison to the NFS one. The local md5sum produces the same result time after time, which is just not the case for the remote.

can it stream data over tcp sockets (netcat or the like) without corruption or obvious problems reflected
in /proc/net/dev?

netcat is not on my systems. Looks like I have to get someone to download and build it for me, and try the streaming tests you recommend.

does ethtool tell you anything about the config of the nic?

Not on the 2.4 systems, though it seems to tell me a little on the 2.6's.

comparing tcp vs udp NFS would be sensible
as well - varying the packet size, too. switching client and/or server to a modern 2.6 kernel may be instructive.

Upgrading the kernel is probably the only way I'll get nfs over tcp. Given that these systems are headed out the door, I'm not sure that's a good use of our time. But it may be worth doing an our new and newer systems.

Thanks again!


Mike

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to