Mark Hahn wrote:
BTW< where a lot of people are jumping on the "Get IPMI " bandwagon, I suggest getting PDUs with remote IP controlled ports is more useful.

the thing I don't like about controlled PDUs is that they're pretty
harsh - don't you expect a higher failure rate of node PSUs if you go yanking the power this way?
Why?
If nodes shutdown, on commands from the scheduler, that is good.
And, if they do not, how is cutting power by the PDU socket any different than a power switch on the node? Obviously we want to avoid "dropping the hammer" on a mounted filesystem, at least until it has its cache
cleared. That is not hard to accomplish.

I have only seen a handful of different IPMI interfaces, but they all
were reasonably reliable.

I have used the Supermicro, Tyan, ASUS, and Dell, and they all had some tendency to choke sometimes. The thing is, at the nominal cost of $50 to $100 per machine for BMC ( IPMI) cards, one can buy a couple of network controlled PDUs,
with the thermal and humidity sensors.
As you are likely to at least buy "dumb" PDUs, this means the typical cost per node added by this is usually around
$30 per node, resulting in a tidy savings.
It also means you are "talking" tp only one device pre 10 to 30 nodes, versus 10 to 30 BMC devices.

Further, these IPMI cards typically "steal" a GbE port on the nodes.


If you set your machines BIOS to start on power up, it is trivial to stop and start machines with the PD U power, and that is definitely reliable.

huh? we're talking about network-attached IPMI, which is fully independent of the controlled motherboard's bios. are you talking about those hybrid systems where the IPMI controller shares an ethernet port with the host?
or IPMI through a kernel driver?

Either.
Most share a port, some have dedicated ports on board.

Plus , with a lot of those PDUs you can add thermal sensors and trigger power off on high temperature conditions.

IPMI normally provides all the motherboard's sensors as well. it seems like those are far more relevant than the temp of the PDU...
I would rather monitor the room temperature at the racks, and shut the whole works down in a hurry if something is wrong, such as air conditioning failure.

using lm_sensors is a poor substitute for IPMI.
Yes, and no.
For monitoring the temps and fans an such on nodes it is quite sufficient.
For power control it is useless, of course.


--
With our best regards,

//Maurice W. Hilarius         Telephone: 01-780-456-9771/
/Hard Data Ltd.                FAX:          01-780-456-9772/
/11060 - 166 Avenue         email:[EMAIL PROTECTED]/
/Edmonton, AB, Canada         http://www.harddata.com//
/     T5X 1Y3/
/
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to