On the "spool RAM to disk" idea - That's sort of like checkpointing, and it can 
take surprisingly long, so there's another tradeoff there.

Not really, especially not with NVMe disk drives. I have NVMe drives in both my laptop and my desktop, and it startling how fast they boot and resume from suspend with NVMe disks.

I think the bigger issue with this approach is if enterprise servers would support this. I believe there has to be some level of hardware support for this, which I doubt servers designed for constant-on use have. Someone please jump in and correct me if I'm wrong here.

Prentice

On 7/16/21 8:38 PM, Lux, Jim (US 7140) via Beowulf wrote:
An interesting question.
The power cycling reliability thing is probably not a big deal - the temperatures change 
a lot between light load and heavy load already, and if a "server class" PC 
can't take a power cycle per day, when the grungiest consumer unit can do it, I'd be 
surprised. It's not like you're cycling between -40C and 70C every hour like in an 
automotive application.

Managing the chillers, though - That might be a bigger problem.

And as Jörg points out, there's a fair amount of sophistication needed in 
setting your turn on and turn off thresholds.

On the "spool RAM to disk" idea - That's sort of like checkpointing, and it can 
take surprisingly long, so there's another tradeoff there.


On 7/16/21, 12:35 PM, "Beowulf on behalf of Douglas Eadline" 
<beowulf-boun...@beowulf.org on behalf of deadl...@eadline.org> wrote:


     Hi everyone:

     Reducing power use has become an important topic. One
     of the questions I always wondered about is
     why more cluster do not turn off unused nodes. Slurm
     has hooks to turn nodes off when not in use and
     turn them on when resources are needed.

     My understanding is that power cycling creates
     temperature cycling, that then leads to premature node
     failure. Makes sense and has anyone ever studied/tested
     this ?

     The only other reason I can think of is that the delay
     in server boot time makes job starts slow or power
     surge issues.

     I'm curious about other ideas or experiences.

     Thanks

     --
     Doug




     --
     Doug

     _______________________________________________
     Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
     To change your subscription (digest mode or unsubscribe) visit 
https://urldefense.us/v3/__https://beowulf.org/cgi-bin/mailman/listinfo/beowulf__;!!PvBDto6Hs4WbVuu7!ef5Z3NxzUcVChBwMKSYQ9u5d4nI_weKdbvUWM6BY8x2UyBeye1j64LNSRzJZUkml3wOJ0TM$

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Reply via email to