Jorge Salamero Sanz wrote:
Hi all,

I'm going to move a 42-nodes beowulf to diskless mode (currently all local cloned installations).

Which system / tools do you recommend to manage the client-images ?

I was thinking on a debootstraped dir shared as NFS root. The differences between the nodes (/etc/hostname, /etc/fstab, /etc/exportfs ...) could be managed with unionfs.

Debian has a couple of tools that could help (live-helper for making custom images) but maybe lessdisk would be more suitable. Which one do you use ?

How do you manage this kind of cluster setup ?

Hi Jorge,

We built a system like this for a customer a few years ago and it has performed very well. We have a head-node which acts as a management node and an NFS server for the diskless workstations in the cluster.

We used debbootstrap to build images for the diskless nodes. We opted to keep separate disk images for each diskless node in order to keep things simple - I'm sure you could do the same with unionfs or similar but diskspace is cheap and the effort to put together something with unionfs didn't seem to be justified at the time.

To add a new node, we simply copy the debootstrapped directory contents and change the hostname.

Each diskless node uses PXE to boot and a monolithic kernel compiled with just the basics needed for the compute nodes. We did some experiments with initrd images and modular kernels but there were some issues with Debian which caused us problems (see bugs http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=386959 and http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=388761) this may have been fixed since we last looked at it but again, the effort to fix it, given that we had a working system wasn't justified.

You may need to use a separate network for PXE booting your nodes - we experienced problems using a gigabit ethernet network for PXE booting where nodes randomly failed to get a response from the DHCP server. I suspect there was a bug in the PXE firmware which occasionally caused it to fail while the network cards were negotiating gigabit speed (but I have no evidence to back this up) - moving PXE booting to a separate fast ethernet network resolved the problem.

I'm not familiar with live-helper or lessdisk, perhaps I need to do more reading :)

Hope this info is of some use, I've probably only covered some random aspects of our config that spring to mind ...

-stephen


--
Stephen Mulcahy, Applepie Solutions Ltd., Innovation in Business Center,
GMIT, Dublin Rd, Galway, Ireland.  +353.91.751262  http://www.aplpi.com
Registered in Ireland, no. 289353 (5 Woodlands Avenue, Renmore, Galway)
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to