Bogdan Costescu wrote:
On Tue, 24 Apr 2007, Mark Hahn wrote:

so the main question is whether jumbo is worth the effort.

I would rephrase it to say: whether jumbo is worth the effort for the root FS. When I used NFSroot, most of the traffic was queries of file/dir existence and file/dir attributes, which are small, so a large maximum packet size would not help. Furthermore, most of the files accessed were small which means that the client could be quite successful in caching them for long times and the actual transfer (if the cache is emptied) would not take too long.

I agree that Jumbo Frames would not be a great help with the root file system but we hope to get a better performance from other NFS servers. As all the machines on the same subnet have to be using the jumbo frames, I have to boot the machines from a server that has jumbo frames enabled. (Or I will have to have an extra ethernet card on every node just for the booting and then boot-server can be on a different subnet with 1500B.)

We are very sure that our current bottlenecks lie at the NFS level. The hard drives or the ethernet are not saturated. Even though NFS is extremely slow, copying files over scp is still very fast between a client and server. We have tried all different ways to tune the NFS for a better performance (increasing NFS deamons on the servers, changing rsize & wsize, using TCP vs UDP, using async vs sync, noatime, timeo). The only thing we have not been able to try yet is jumbo frames. We could redistribute our data across even more NFS servers but that is not possible with the current state of application. If we don't find a solution soon, we might have to give up on NFS and try some clustered file system solution.


I think that it is more important to think thoroughly the placement and exporting of the files on the NFS server. If you can manage to export a single directory which is mounted as-is by the clients and have the few client-specific files either mounted one by one or copied/generated on the fly and placed on a tmpfs (and linked from there), you can speed up the serving of the files, as the most accessed files will stay in the cache of the server. The Stateless Linux project from Red Hat/Fedora used such a system (single root FS then client-specific files mounted one by one) last time I looked at it.

here's a cute hack: if you're using pxelinux, it has an "ipappend" feature,
...
I haven't had time to try this...

It works as you described it.

But even the first idea that you mentioned, using dhclient to get an IP would work just as fine if the number of nodes is not too big - I have 100+ nodes configured that way, with 2 DHCP requests per boot of node (the PXE one and the dhclient one) as I was just too lazy to try to eliminate the second DHCP request by re-using the info from PXE - and the master node doesn't feel the load at all, although it is hardware-wise the poorest machine from the cluster (as opposed to most other clusters that I know of ;-)).

The way nodes are booting now, they use pxelinux to get the ip address, and then download the kernel from the tftp server. The configuration for pxe is as below:

DEFAULT bzImage
APPEND acpi=off debug earlyprintk=vga initcall_debug console=tty0 initrd=bzImage ramdisk=40960 root=/dev/nfs nfsroot=192.168.1.254:/slave/root ro ip=dhcp

Is it being suggested that somehow MTU size can be configured over here? /sbin/dhclient-script will not be available until the nfsroot is mounted. Am I missing something here ?



thanks

Amrik


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to