Mark Hahn wrote:
BTW< where a lot of people are jumping on the "Get IPMI " bandwagon,
I suggest getting PDUs with remote IP controlled ports is more useful.
the thing I don't like about controlled PDUs is that they're pretty
harsh - don't you expect a higher failure rate of node PSUs if you go
yanking the power this way?
Why?
If nodes shutdown, on commands from the scheduler, that is good.
And, if they do not, how is cutting power by the PDU socket any
different than a power switch on the node?
Obviously we want to avoid "dropping the hammer" on a mounted
filesystem, at least until it has its cache
cleared. That is not hard to accomplish.
I have only seen a handful of different IPMI interfaces, but they all
were reasonably reliable.
I have used the Supermicro, Tyan, ASUS, and Dell, and they all had some
tendency to choke sometimes.
The thing is, at the nominal cost of $50 to $100 per machine for BMC (
IPMI) cards, one can buy a couple of network controlled PDUs,
with the thermal and humidity sensors.
As you are likely to at least buy "dumb" PDUs, this means the typical
cost per node added by this is usually around
$30 per node, resulting in a tidy savings.
It also means you are "talking" tp only one device pre 10 to 30 nodes,
versus 10 to 30 BMC devices.
Further, these IPMI cards typically "steal" a GbE port on the nodes.
If you set your machines BIOS to start on power up, it is trivial to
stop and start machines with the PD U power, and that is definitely
reliable.
huh? we're talking about network-attached IPMI, which is fully
independent
of the controlled motherboard's bios. are you talking about those
hybrid systems where the IPMI controller shares an ethernet port with
the host?
or IPMI through a kernel driver?
Either.
Most share a port, some have dedicated ports on board.
Plus , with a lot of those PDUs you can add thermal sensors and
trigger power off on high temperature conditions.
IPMI normally provides all the motherboard's sensors as well. it
seems like those are far more relevant than the temp of the PDU...
I would rather monitor the room temperature at the racks, and shut the
whole works down in a hurry if something is wrong, such as air
conditioning failure.
using lm_sensors is a poor substitute for IPMI.
Yes, and no.
For monitoring the temps and fans an such on nodes it is quite sufficient.
For power control it is useless, of course.
--
With our best regards,
//Maurice W. Hilarius Telephone: 01-780-456-9771/
/Hard Data Ltd. FAX: 01-780-456-9772/
/11060 - 166 Avenue email:[EMAIL PROTECTED]/
/Edmonton, AB, Canada http://www.harddata.com//
/ T5X 1Y3/
/
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf