Re: [Beowulf] What services do you run on your cluster nodes?

Robert G. Brown Tue, 23 Sep 2008 03:16:55 -0700

On Mon, 22 Sep 2008, Joe Landman wrote:

Prentice Bisbal wrote:
The more services you run on your cluster node (gmond, sendmail, etc.)
the less performance is available for number crunching, but at the same
time, administration difficulty increases. For example, if you turn off
postfix/sendmail, you'll no longer get automated e-mails from your
system to alert you to a problem.
Does every node need to be running sendmail/postfix? In most cases, nodesshould be fairly "dumb", in the sense of having as absolutely little aspossible actively running. They largely need little more than anauthentication service, a login/process start service, a disk service (NFS,panfs, glusterfs, ... ...).


One can always run xmlsysd instead, which is a very lightweight
on-demand information service.  It costs you, basically, a socket, and
you can poll the nodes to get their current runstate every five seconds,
every thirty seconds, every minute, every five minutes.  Pick a
granularity that drops its impact on a running computation to a level
you consider tolerable, while still providing you with node-level state
information when you need it.

Just a thought...;-)

   rgb

My question is this: how extreme do you go in disabling non-essential
services on your cluster nodes? Do you turn off *everything* that's not
absolutely necessary, do you leave somethings running to make
administration easier?
As long as you have an ssh portal in as root, you should be fine for admin.Though, from an admin point of view, as you scale up the number of nodes, youwant the admin load to remain constant, that is, not to scale with increasingnode count. Moreover, you want to actively reduce the number of movingparts, as it were, as you scale up, as moving parts tend to break. These arethings like installs, or images. We have customers who occasionally (againstour advice) test the limits of their "cluster installer". What isinteresting is that they can't *successfully* install/image more than about20-24 successfully at a time. Yes they can install more than that, but no,the systems they install that way seem to have some problems which go away atnext reload.
Basically as you scale up the system, you want to scale down, if notcompletely eliminate, node level admin. You definitely don't want the nodesto be spending cycles (and therefore power, time, resources) on things thatthey really ought not to spend time on.
Joe
I'm curious to see how everyone else has their cluster(s) configured.


--
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] What services do you run on your cluster nodes?

Reply via email to