Mark,

Thanks for your helpful comments.

At 11:31 PM 12/3/2007, you wrote:
I am guessing you are using TCP NFS mounts as well? TCP forces retries in the event of bad packets. UDP doesn't force this, but the NFS protocol will

UDP has a checksum as well, though it's only 16b.  then again, the TCP
checksum isn't all that strong for today's data rates either.

From reading the man page on nfs on the systems with the 2.4 kernels, it looks like the default for an nfs mount is udp. It also looks like tcp is not really an option until nfs v4, so it may be something to try on the 2.6 kernels that I have on some of my newer machines at another site.

you should definitely examine /proc/net/dev on involved machines.

I hadn't known about /proc/net/dev. When I check there, I see no transmit errors on the server side and no receive errors on the client side. That's odd, because the other thing I see is that the average packet size received (bytes received divided by packets received) on the client side is 3.9, while on the server side, the average packet size sent is 1430. In other words, there are a many more packets received than there ought to be. That's very fishy. It's probably the result of the way the packet count is done and reported. I.e., it may be that all the received packets -- good and bad -- are counted, but only the bytes in the good ones are counted, with some similar problem on the server side. I think the statistics are aggregate since the last boot, so they may not be just from the troublesome tests I was performing, either.

I would attempt to reduce the complexity of your testing.
for instance, can a node write and verify to its local disk
without problem?

The local disk read seems rock solid in comparison to the NFS one. The local md5sum produces the same result time after time, which is just not the case for the remote.

can it stream data over tcp sockets (netcat or the like) without corruption or obvious problems reflected
in /proc/net/dev?

netcat is not on my systems. Looks like I have to get someone to download and build it for me, and try the streaming tests you recommend.

does ethtool tell you anything about the config of the nic?

Not on the 2.4 systems, though it seems to tell me a little on the 2.6's.

comparing tcp vs udp NFS would be sensible
as well - varying the packet size, too. switching client and/or server to a modern 2.6 kernel may be instructive.

Upgrading the kernel is probably the only way I'll get nfs over tcp. Given that these systems are headed out the door, I'm not sure that's a good use of our time. But it may be worth doing an our new and newer systems.

Thanks again!


Mike
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to