> Does any one know what types of problems/challanges for big clusters? cooling, power, managability, reliability, delivering IO, space.
> we are considering having a 512 node cluster that will be using > Myrinet as its main interconnect, and would like to do our homework how confident are you at addressing especially the physical issues above? cooling and power happen to be prominent in my awareness right now because of a 768-node cluster I'm working on. but even ~200 node clusters need to have some careful thought applied to managability (cleaining up dead jobs, making sure the scheduler doesn't let jobs hang around consuming myrinet ports, for instance.) reliability is a fairly cut and dried issue, IMO - either you make the right hardware decisions at purchase, or not. > The cluster is meant to run an inhouse fluid simulation application > that is I/O intensve, and requires large memory models. what parallel-cluster filesystem are you planning to run? how many fileservers? (or is the IO intensivity handlable using per-node disks?) _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf