Re: [Beowulf] RRDtools graphs of temp from IPMI

Craig West Sat, 08 Nov 2008 21:51:11 -0800


Gerry,

Like others, I too use ganglia - and have a custom script which reportscpu temps (and fan speeds) for the nodes. However, I changed the defaultmethod of communication for ganglia (multicast) to reduce the chatter. Iuse a unicast setup, where each node reports directly to the monitoringserver - which is a dedicated machine for monitoring all the systems -and performing other tasks (dhcp, ntp, imaging, etc)

Each node is using less than 1KB/sec to transmit all the gangliainformation, including my extra metrics. For the useful recordinginformation you get from this data its worth the rather small networkchatter. You can tune the metrics further, turn off the ones you don'twant, or have them report less often.

I'd suggest installing it, if you still think it is chatty, then removeit and look for another option. I find it useful in that you can seewhen a node died, what the load on the node was when it crashed, whatthe network traffic is, etc...

I also use cacti - but only for the head servers, switches, etc. I findit has too much over head for the nodes. It is however useful in that itcan send emails to alert you to problems, and allows for graphing ofSNMP devices.


Craig.

Gerry Creager wrote:

Now, for the flame-bait. Bernard suggests cacti and/or ganglia tohandle this. Our group have heard some mutterings that ganglia is a"chatty" applicaiton and could cause some potential hits on or 1 Gbeinterconnect fabric.
A little background on our current implementation: 126 dual-quad coreXeon Dell 1950's interconnected with gigabit ethernet. No, it's notthe world's best MPI machine, but it should... and does... performadmirably for throughput applications where most jobs can be run on anode (or two) but which don't use MPI as much as, e.g., OpenMP, or insome cases, even run on a single core but use all the RAM.
So, we're worried a bit about having everything talk on the samegigabit backplane, hence, so far, no ganglia.
What are the issues I might want to worry about in this regard,especially as we expand this cluster to more nodes (potentially goingto 2k cores, or, essentially doubling?


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] RRDtools graphs of temp from IPMI

Reply via email to