Re: [Beowulf] diskless booting over jumbo frames

Amrik Singh Wed, 25 Apr 2007 08:19:24 -0700


Bogdan Costescu wrote:

On Tue, 24 Apr 2007, Mark Hahn wrote:
so the main question is whether jumbo is worth the effort.
I would rephrase it to say: whether jumbo is worth the effort for theroot FS. When I used NFSroot, most of the traffic was queries offile/dir existence and file/dir attributes, which are small, so alarge maximum packet size would not help. Furthermore, most of thefiles accessed were small which means that the client could be quitesuccessful in caching them for long times and the actual transfer (ifthe cache is emptied) would not take too long.

I agree that Jumbo Frames would not be a great help with the root filesystem but we hope to get a better performance from other NFS servers.As all the machines on the same subnet have to be using the jumboframes, I have to boot the machines from a server that has jumbo framesenabled. (Or I will have to have an extra ethernet card on every nodejust for the booting and then boot-server can be on a different subnetwith 1500B.)

We are very sure that our current bottlenecks lie at the NFS level. Thehard drives or the ethernet are not saturated. Even though NFS isextremely slow, copying files over scp is still very fast between aclient and server. We have tried all different ways to tune the NFS fora better performance (increasing NFS deamons on the servers, changingrsize & wsize, using TCP vs UDP, using async vs sync, noatime, timeo).The only thing we have not been able to try yet is jumbo frames. Wecould redistribute our data across even more NFS servers but that is notpossible with the current state of application. If we don't find asolution soon, we might have to give up on NFS and try some clusteredfile system solution.

I think that it is more important to think thoroughly the placementand exporting of the files on the NFS server. If you can manage toexport a single directory which is mounted as-is by the clients andhave the few client-specific files either mounted one by one orcopied/generated on the fly and placed on a tmpfs (and linked fromthere), you can speed up the serving of the files, as the mostaccessed files will stay in the cache of the server. The StatelessLinux project from Red Hat/Fedora used such a system (single root FSthen client-specific files mounted one by one) last time I looked at it.
here's a cute hack: if you're using pxelinux, it has an "ipappend"feature,
...
I haven't had time to try this...
It works as you described it.
But even the first idea that you mentioned, using dhclient to get anIP would work just as fine if the number of nodes is not too big - Ihave 100+ nodes configured that way, with 2 DHCP requests per boot ofnode (the PXE one and the dhclient one) as I was just too lazy to tryto eliminate the second DHCP request by re-using the info from PXE -and the master node doesn't feel the load at all, although it ishardware-wise the poorest machine from the cluster (as opposed to mostother clusters that I know of ;-)).

The way nodes are booting now, they use pxelinux to get the ip address,and then download the kernel from the tftp server. The configuration forpxe is as below:


DEFAULT bzImage

APPEND acpi=off debug earlyprintk=vga initcall_debug console=tty0initrd=bzImage ramdisk=40960 root=/dev/nfsnfsroot=192.168.1.254:/slave/root ro ip=dhcp

Is it being suggested that somehow MTU size can be configured over here?/sbin/dhclient-script will not be available until the nfsroot ismounted. Am I missing something here ?




thanks

Amrik


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] diskless booting over jumbo frames

Reply via email to