Rushat Rai wrote
I don't know if this has been mentioned, but ECC could be slowing down that specific node if it has a faulty stick.
To find the bad stick one often must disable ECC, at least that was the case many years ago the last time I ran into that. If ECC is enabled, even if the stick is somewhat defective, it may still pass memtest86+. That utility will show if ECC is enabled or not, and the ECC disable, if there is one, is set in the motherboard BIOS.
I'm late to this thread, does this node have a local disk? Failing disks can really slow things down if the device has to read the same block many times before it succeeds. That usually shows up in smartctl.
What sort of network connect? Try swapping those cables. Also run the network throughput test of your choice. If the problem is there those tests will reveal it.
"sensors" should show roughly the same values as the other nodes, if not, figure out why. As others have suggested that could be blocked ventilation, but more often in my experience it is a fan on the way out.
Regards, David Mathog mat...@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf