On the "spool RAM to disk" idea - That's sort of like checkpointing, and it can take surprisingly long, so there's another tradeoff there.
Not really, especially not with NVMe disk drives. I have NVMe drives in both my laptop and my desktop, and it startling how fast they boot and resume from suspend with NVMe disks.
I think the bigger issue with this approach is if enterprise servers would support this. I believe there has to be some level of hardware support for this, which I doubt servers designed for constant-on use have. Someone please jump in and correct me if I'm wrong here.
Prentice On 7/16/21 8:38 PM, Lux, Jim (US 7140) via Beowulf wrote:
An interesting question. The power cycling reliability thing is probably not a big deal - the temperatures change a lot between light load and heavy load already, and if a "server class" PC can't take a power cycle per day, when the grungiest consumer unit can do it, I'd be surprised. It's not like you're cycling between -40C and 70C every hour like in an automotive application. Managing the chillers, though - That might be a bigger problem. And as Jörg points out, there's a fair amount of sophistication needed in setting your turn on and turn off thresholds. On the "spool RAM to disk" idea - That's sort of like checkpointing, and it can take surprisingly long, so there's another tradeoff there. On 7/16/21, 12:35 PM, "Beowulf on behalf of Douglas Eadline" <beowulf-boun...@beowulf.org on behalf of deadl...@eadline.org> wrote: Hi everyone: Reducing power use has become an important topic. One of the questions I always wondered about is why more cluster do not turn off unused nodes. Slurm has hooks to turn nodes off when not in use and turn them on when resources are needed. My understanding is that power cycling creates temperature cycling, that then leads to premature node failure. Makes sense and has anyone ever studied/tested this ? The only other reason I can think of is that the delay in server boot time makes job starts slow or power surge issues. I'm curious about other ideas or experiences. Thanks -- Doug -- Doug _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://urldefense.us/v3/__https://beowulf.org/cgi-bin/mailman/listinfo/beowulf__;!!PvBDto6Hs4WbVuu7!ef5Z3NxzUcVChBwMKSYQ9u5d4nI_weKdbvUWM6BY8x2UyBeye1j64LNSRzJZUkml3wOJ0TM$ _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf