monolithic all-or-none creature. From what you write (and my online
reading) it seems there are several discrete parts:

IMPI 2.0
switched remotely accessible PDUs
"serial concentrator type system "

I think Joe was going a bit belt-and-suspenders-and-suspenders here.

ipmi normally provides out-of-band access to the system's I2C bus
(which lets one power on/off, reset, and read the sensors.)  it also
normally provides some form of console access: usually this is by serial redirection (serial output can be redirected through the BMC
and onto the net).  independent of this (but usually also provided)
is a bios feature which scrapes the video character array onto serial,
thus giving access to bios output (and also technically independent
but also provided is lan->bmc->serial->bios "keyboard" input.)

some people also configure systems with network-aware PDUs (power bars):
APC is a common provider of these, and they provide a backup if IPMI
doesn't work for some reason (network problems, hung BMC, etc).
I do not personally think they are worthwhile because I rarely see IPMI problems - admittedly perhaps due to the fairly narrow range of parts my organization has. smart PDUs sometimes also provide power montoring, which might be useful, though I would actually prefer to see IPMI merely provide current sensors via I2C (in addition to volts).
(having both socket power and motherboard power might be amusing, though,
since you could calculate your PSU's efficiency - potentially even its load-efficiency curve. most vendors now quote 92-93% efficiency, but it's unclear what load range that's for...)

finally, I think Joe is advocating another layer of backups - serial concentrators that would connect to the console serial port on each node to collect output if IPMI SOL isn't working. this is perhaps a matter of taste, but I don't find this terribly useful. I thought it would be for my first cluster, but never actually set it up. but again, that's
because IPMI works well in my experience.

I think Joe's right in the sense that you _don't_ want a cluster without
working power control, and working post/console redirection is pretty valuable as well. both become more critical with larger cluster sizes, mainly because the chances grow of hitting a problem where you need power/reset/console control. whether you need backup systems past IPMI
is unclear - depends on whether your IPMI works well.

Correct me if I am wrong but these are all "options" and varying
vendors and implementations  will offer parts or all or none of these?
Or is it that when one says "IPMI 2" it includes all these features. I

I interpreted Joe as saying that you need IPMI2 (remote power/reset/console)
as well as backup mechanisms for IPMI failures.

hard to translate jargon across vendors. e.g. for Dell they are called
DRAC's etc.

vendors provide IPMI features, usually with added proprietary nonsense. sometimes they sacrifice parts of IPMI in favor of the proprietary crap...

Finally, what's  a"serial concentrator"? Isn't that the same as the
SOL that Skylar was explaining to me? Or is that something different
too?

a network-accessible box into which many serial ports plug. some let you transform a serial port into a syslog stream, for instance.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to