On Sun, Mar 24, 2024 at 2:38 PM Michael DiDomenico <mdidomeni...@gmail.com> wrote:
> thanks, there's some good info in there. just to be clear to others that > might chime in i'm less interested in the immersion/dlc debate, then > getting updates from people that have sat on either side of the fence. > dlc's been around awhile and so has immersion, but what i can't get from > sales glossy's is real world maintenance over time. > > being in the DoD space, i'm well aware of the HPE stuff, but they're also > whats making me look at other stuff. i'm not real keen on +100kw racks, > there are many safety concerns with that much amperage in a single > cabinet. not to mention all that custom hardware comes at stiff cost and > in my opinion doesn't have a good ROI if you're not buying 100's of racks > worth of it. but your space constrained issue is definitely one i'm > familiar with. our new space is smaller then i think we should build, but > we're also geography constrained. > > the other info i'm seeking is futures, DLC seems like a right now solution > to ride the AI wave. i'm curious if others think DLC might hit a power > limit sooner or later, like Air cooling already has, given chips keep > climbing in watts. and maybe it's not even a power limit per se, but DLC > is pretty complicated with all the piping/manifolds/connectors/CDU's, does > there come a point where its just not worth it unless it's a big custom > solution like the HPE stuff > The ORv3 rack design's maximum power is the number of power shelves times the power per shelf. Reach out to me directly at <my first name> @ ornl.gov and I can connect you with some vendors. > > > On Sun, Mar 24, 2024 at 1:46 PM Scott Atchley <e.scott.atch...@gmail.com> > wrote: > >> On Sat, Mar 23, 2024 at 10:40 AM Michael DiDomenico < >> mdidomeni...@gmail.com> wrote: >> >>> i'm curious to know >>> >>> 1 how many servers per vat or U >>> 2 i saw a slide mention 1500w/sqft, can you break that number into kw >>> per vat? >>> 3 can you shed any light on the heat exchanger system? it looks like >>> there's just two pipes coming into the vat, is that chilled water or oil? >>> is there a CDU somewhere off camera? >>> 4 that power bar in the middle is that DUG custom? >>> 5 any stats on reliability? like have you seen a decrease in the hw >>> failures? >>> >>> are you selling the vats/tech as a product? can i order one? :) >>> >>> since cpu's are pushng 400w/chip, nvidia is teasing 1000w/chip coming in >>> the near future, and i'm working on building a new site, i'm keenly >>> interested in thoughts on DLC or immersion tech from anyone else too >>> >> >> As with all things in life, everything has trade-offs. >> >> We have looked at immersion at ORNL and these are my thoughts: >> >> *Immersion* >> >> - *Pros* >> - Low Power Usage Efficiency (PUE) - as low as 1.03. This means >> that you only spend $0.03 per dollar to cool a system for each $1.00 >> that >> the system consumes in power. In contrast, air-cooled data centers can >> range from 1.30 to 1.60 or higher. >> - No special racks - can install white box servers and remove the >> fans. >> - No cooling loops - no fittings that can leak, get kinked, or >> accidentally clamped off. >> - No bio-growth issues >> - *Cons* >> - Low power density - take a vertical rack and lay it sideways. DLC >> allows the same power density with the rack being vertical. >> - Messy - depends on the fluid, but oil is common and cheap. Many >> centers build a crane to hoist out servers and then let them drip dry >> for a >> day before servicing. >> - High Mean-Time-To-Repair (MTTR) - unless you have two cranes, >> you cannot insert a new node until the old one has dripped dry and been >> removed from the crane. >> - Some solutions can be expensive and/or lead to part failures due >> to residue build up on processor pins. >> >> *Direct Liquid Cooling (DLC)* >> >> - *Pros* >> - Low PUE compared to air-cooled. Depends on how much water >> capture. Summit uses hybrid DLC (water for CPUs and GPUs and air for >> DIMMs, >> NICs, SSDs, and power supply) with ~22°C water. Summit's PUE can range >> from >> 1.03 to 1.10 depending on the time of year. Frontier, on the other >> hand, is >> 100% DLC (no fans in the compute racks) with 32°C water. Frontier's >> PUE can >> range from 1.03 to 1.06 depending on the time of year. Both PUEs >> include >> the pumps for the water towers and to move the water between the >> Central >> Energy Plant and the data center. >> - High power density - the HPE Cray EX 4000 "cabinet" can supply >> up to 400 KW and is equivalent in space to two racks (i.e., 200 KW per >> standard rack). If your data center is space constrained, this is a >> crucial >> factor. >> - No mess - DLC with Deionized water (DI water) or with Propylene >> Glycol Water (PGW) systems use dripless connectors. >> - Low MTTR - remove a server and insert another if you have a >> spare. >> - *Cons* >> - Special racks - HPE cabinets are non-standard and require HPE >> designed servers. This is changing. I saw many examples of ORv3 racks >> at >> GTC that use the OCP standard with DLC manifolds. >> - Cooling loops - Loops can leak at fittings, be kinked, or >> crimped that restricts flow and cause overheating. Hybrid loops are >> simpler >> while 100% DLC loops are more complex (i.e., expensive). Servers tend >> to >> include drip sensors to detect this, but we have found that the DIMMs >> are >> better drip sensors (i.e., the drips hit them before finding the drip >> sensor). 😆 >> - Bio-growth >> - DI water includes biocides and you have to manage it. We have >> learned that no system can be bio-growth free (e.g., inserting a >> blade will >> recontaminate the system). That said, Summit has never had any >> biogrowth-induced overheating and Frontier has gone close to nine >> months >> without overheating issues due to growth. >> - PGW systems should be immune to any bio-growth but you lose >> ~30% of the heat removal capacity compared to DI water. Depending >> on your >> environment, you might be able to avoid trim water (i.e., mixing in >> chilled >> water to reduce the temperature). >> - Can be expensive to upgrade the facility (i.e., to install >> evaporative coolers, piping, pumps, etc.). >> >> For ORNL, we are space constrained. For that alone, we prefer DLC over >> immersion. >> >> >> >> _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf