Would this mean that a users environment could never exceed the resources of a single node?
Andy On 26/07/07, Julien Leduc <[EMAIL PROTECTED]> wrote:
>> I'm interested in utilising the hardware to create something akin to >> the sun grid or the amazon elastic computing cloud whereby the >> resources available to the environment are automatically expanded and >> contracted. Maybe I have the wrong end of the stick on how these >> services operate. > > no, I think you're right on, and there's not much to it. why do you > think Sun or Amazon have any special magic? beowulf clusters running > multi-user queueing systems are precisely such an "elastic", "compute- > on-demand" thingy, just without paying for the isolation, because such > clusters are mainly motivated by performance. Running a multi-user queueing system, you can have a cluster that behaves like Sun or Amazon projects: you just choose the nodes that can fullfill the user needs and requirements, fetch a VM on those chosen nodes (during the 'prolog' section of the batch scheduler), start the VMs on the physical nodes, ensure the user can log on those or fetch his data / run a passive job. Then, once finished, clean up all that mess by destroying the VM, and let another user reserve the node. More isolation can be achieved, if the user needs to be root on the node, to run a modified version of the kernel, or run several VMs on top of his environment. For that, you have to let him deploy his own environment on the node. This last technique ensure reproductible experiments, more performances, drawbacks are: more work on the middleware that make all that magic come true. Combining the 2 previous techniques could help users to test their OS+experimentation program in a VM and then deploy it at larger scale for a true run on all the cluster(s ;) ). This is a very interesting approach (at least for computer scientists) and the second approach gives quite good results for the moment, the combination of the 2 techniques has to be implemented to give away more ressources so that users can test their environments on many virtual nodes, consuming less physical nodes. The main problem is to be able to control the nodes remotely, with hardware supporting remote reboots, remote console management... Julien Leduc
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf