On Tue, 9 Oct 2007, Douglas Eadline wrote:
Excellent point. I have often thought that "diskless" provisioning
opens up lots of opportunities to create custom node groups
based on kernels or distributions. Throw in a virtualized
head node and many ISV requirements could be handled this way
e.g. a virtualized Suse environment running on top of Red Hat
could request 32 Suse nodes from the scheduler (running under a
Red Hat instance). The scheduler just provisions nodes as needed
and sets them in a low power state when not being used.
Going with fully virtualized nodes is another option provided
the applications are still close to hardware.
Note that diskless provisioning does not imply diskless nodes,
if you need local drives, then you can still use them in a
diskless booting scheme. Not nailing an OS to the hard drive
on cluster nodes has lots of advantages.
This has been the subject for lots of Real Computer Science, some of it
done by Jeff Chase and students here at Duke (including Greg Lindahl's
ex-student, Justin Moore). See "Cluster On Demand" here:
http://www.cs.duke.edu/nicl/cod/
(and there are various other links GIYF). COD is basically (as I
understand it) a layer of automated "provisioning software" that takes a
user's resource request (written IIRC in xmlish but I could easily be
wrong), creates a one-time cluster boot image that satisfies it,
allocates nodes from a large, generic pool, reboots them into the boot
image, connects them with suitable workspace (part of the provisioning),
and even starts up the user's job(s) on them. Or not.
The provisioning is pretty much OS-neutral. Want a Windows cluster? No
problem (licensing permitting, of course). Solaris? If it will run on
the hardware and is supported, sure. Linux obviously -- any flavor, any
size, licensed as needed or not. Ditto all the other free and open
source OS's. If you have specific needs for libraries, tools, memory,
processor count or cores, networking (and the needs can be met within
the cluster pool) it will allocate nodes, provision them, and crank them
up for you.
One of several GOOD things about this is that your nodes GO AWAY after
you are done with them. Doing top-secret work for NSA? Once you're
done (especially with diskless provisioning) there isn't even a disk
image left on the nodes to be reconstructed by means of advanced
magnetic analysis...
Provisioning really doesn't take very long any more. Diskless almost no
time at all, but provisioning a full local boot image needn't take very
long either.
I don't know the status of this project, but just wanted to point out
that this is going on and that one day we may yet see a full open source
solution built in to Linux (as Linux is a very reasonable choice for a
toplevel platform to run this). All of this can obviously be done by
hand, but it's the automation part that is interesting. And of course
the advent of serious VM with processor level support means that we will
shortly have even more options -- a whole second way of doing it NOW is
to create and provision portable VMs and run them under e.g. VMware or
whatever.
rgb
--
Doug
On Mon, 8 Oct 2007, Robert G. Brown wrote:
RHEL/Centos are good where vendors require "binary compatibility" on
closed source software, as the standard of said binary
compatibility.
What strikes me in this whole discussion is the ideea of 'one
distribution fits all' when applied to all nodes of a cluster and all
applications that run on that cluster. In the days of PXE booting,
with several solutions readily available for either building a node
from scratch (like kickstart) or booting a prebuilt setup with
NFS-root or ramdisk, what's so difficult in matching on request a
node, an application and a distribution/custom setup ?
Real case: A quantum mechanics code that we have bought some years ago
was provided only as staticly-linked binaries. They have worked fine
on the current distros at that time and we have succesfully used them
on CentOS-3 (2.4 kernel). However we discovered the hard way on the
new CentOS-5 (2.6 kernel) that the statically linked binaries didn't
work anymore as the kernel interfaces have changed - but, after a few
lines were changed in the config files and the nodes rebooted, the
binaries were again happily running in their required configuration.
Of course, the admin is responsible in defining which
distributions/custom setups can run on a certain node, based on the
hardware of that node and the kernel of the distribution/custom setup.
But after this is done, the user can limit his/her jobs to running on
these nodes or ask the queueing system to set up a node according to
the requirements of the job (I think that term is 'provisioning').
Sure, it helps in this case to run a distribution with long support
(like RHEL/CentOS/SL, SLES or Ubuntu LTS) such that you don't have to
waste too much time yourself with updates, especially security related
ones.
Far short of Debian, but plenty big enough to include just about all
mainstream useful packages for any cluster or LAN.
I'm making sure that any cluster related package that is part of the
default distribution is not part of what the nodes get to run. Why ?
Because very often the common ground options used for building the
package (which is a good idea for a widely used distribution) don't
fit _my_ setup. So, I take the fact that the distibution offers me all
the needed tools as a fallback, but I'm always trying to match as well
as possible all the components. And if you search the archives of the
LAM/MPI mailing lists you'll see the larger picture...
--
Bogdan Costescu
IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [EMAIL PROTECTED]
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
!DSPAM:470b5a8976572020149523!
--
Doug
--
Robert G. Brown
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone(cell): 1-919-280-8443
Web: http://www.phy.duke.edu/~rgb
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf