Well I did pull out the nvidia gpu. One reason I suspected this is
that I had seen many messages about kernel oopses for the nouveau
driver. This gpu is just for ML, the console is connected to a
motherboard builtin VGA (and normally all use is remote anyway). The
last time the system became unu
if it was failing/weak power supply it would just crash, nothing slows
down nicely when that happens.
Nvidia GPU will usually crash the hardware if it overstresses the
power supply and will also crash if it goes bad.
Now overheating may cause the cpus to throttle and that may make the
machine fee
I've been running F32 on a shiny new amd dual epyc workstation for
about 1 year. The system is now remote to me and not convenient to
access.
About 1 week ago the system became unresponsive. I noticed errors
logged about I/O errors, so I guessed it was an issue with the SSD. I
went there and re