Jon,

> I have a rack full of identical compute
> nodes. One of them has become heat sensitive.
> 
> When it's in the warm computer room it crashes.
> I can't even run memtest from the CentOS DVD
> for 2 seconds. However, when this node is
> in my much cooler office everything works
> fine. All the other nodes are working fine
> in the computer room.
I'd such a problem when the plastic clip wich 
mount the base ring of CPU cooler was broken
and CPU cooler was mounted by the rest 3 clips.
When I started to save Virtual Machine compiling
OpenFOAM from sources, Ubuntu made shutdown on
overheat.
> 
> I'm not convinced the problem is actually
> the memory. Other than opening the node
> to spray cooling liquid when it's in the warm
> room, what approach would you use to figure out which
> component(s) is(are) failing?
> 
> Cordially,
> -- 
> Jon Forrest
> Research Computing Support
> College of Chemistry
> 173 Tan Hall
> University of California Berkeley
> Berkeley, CA
> 94720-1460
> 510-643-1032
> jlforr...@berkeley.edu
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
>  

Sincerely,
Dmitry

Яндекс.Почта. Поищите спам где-нибудь еще http://mail.yandex.ru/nospam/sign
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to