On 07/28/2014 03:07 PM, Joe Landman wrote:
On 7/28/14, 2:55 PM, Prentice Bisbal wrote:
On 07/28/2014 01:29 PM, Jeff White wrote:
Power draw will vary greatly depending on many factors. Where I am at we currently have 16 racks of HPC equipment (compute nodes, storage, network gear, etc.) using about 140kVA but can use up to 160 kVA. A single rack with 26 compute nodes each with 64 cores worth of AMD 6276 (Supermicro boxes) is using about 18 kW across the PDUs, 3 phase at 240 volts, with most of the nodes at 100% CPU usage.

Agreed there's a lot of variability. Since I don't exactly what's going in my new space yet, I'm looking for everyone's input to come up with an average, or ballpark amount. the 5 - 10 kW one vendor specified seems waaaay too low for a rack of high-density HPC nodes running at or near 100% utilization.

Seriously, don't design for average, shoot for worst case scenario. Nothing suck so much as having too low of a power or cooling budget and a big new shiny that can't be fully turned on thanks to that.

This is exactly what I'm trying to do. I assume HPL will provide a worst case scenario, based on the average of everyone else's worst case scenario. I know that doesn't make sense, but I need to eliminate outliers that are extremely high density, like HP's new Apollo systems. If my systems don't have enough power to run HPL, I can't even perform acceptance testing!


I can't speak to what other vendors say/do in this regard, but I can say that we try to make sure we never use more than 50% of the capacity of any particular PDU, and that the PDUs have enough head room to be able to handle sudden loads (say one of the PDUs falling over).

In engineering, they call this a safety factor. When I was in school, a common safety factory was something like worst case scenario + 20%, but extreme safety considerations, like bridges or amusement park rides, got a much higher safety factor.

We've had a situation (years ago) where we were pressed not to "over-spec" the power, and despite our protests, this is what was installed. First time a PDU tripped a breaker (did I mention that they overloaded our original design? No? Well ...), all the load hit the second PDU, full force. This was not pretty.

The cost to "over spec" is in the noise relative to the opportunity cost for under spec'ing, not to mention the "additional" cost of more power (and cooling ... don't forget the cooling!).

I agree. If I overspec, no one will notice, except the accountants. If I underspec, and we can't use the datacenter at it's designed capacity, everyone will notice, and it will be an embarassment for our group.

You can set the maximum boundary on power pretty easily with maximum draw per node and basic math. This ignores inrush current and power, but lets assume you do a phased power on (1-3 second intervals between nodes). If you want to hit all the power buttons at once, just make sure you have enough headroom for that inrush.

Its not a dark art per se, but be quite aggressive in what you think your power draws are going to be. Use that to set your upper bound, and assume you don't want to run your PDUs to 75% capacity normally (though under extreme load with half of your other PDUs offline, this isn't a bad target).

I want to be very aggressive and allow excess capacity as a safety margin and for future growth, but we hitting our budget limits, and some are trying to 'right size' our power and cooling, which I'm afraid could be disastrous. Some involved in the discussion have stated only 5 - 10 kW per full rack, which is too small. Since I don't know exactly what systems I'm going to get from my RFP, I can't do exact calculations based on specific models. I could do a few different models, but that can be time consuming, and it's not always easy to get all that information from the vendors.



_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to