On Fri, 7 Aug 2009, David Ramirez wrote:

Due to space constraints I am considering implementing a 8-node (+ master)
HPC cluster project using small form computers. Knowing that Shuttle is a
reputable brand, with several years in the market, I wonder if any of you
out there has already used them on clusters and how has been your experience
(performance, reliability etc.)

I've built a cluster of 80 nodes, which will turn 5 this month. Using Shuttle SB75G2, supports ECC, has a GigE on board (Broadcom) and the power supply is more than enough for the CPU (PIV Northwood 3.2GHz), one SATA HDD, a low power and performance graphics card (there's no on board graphics unfortunately) and an extra GigE card (Intel E1000). The decision for adding an extra NIC was not due to problems with the Broadcom chip, but simply to have dedicated networks; the Broadcom is able to do PXE just fine and this is the way these nodes have booted since setting them up.

I was pleasantly surprised by the reliability of these computers. Given their tightness, they require attention and good skills when building them, f.e. using good quality thermal paste to avoid local thermal problems and routing cables to avoid transport thermal problems. About 70 of the 80 are still running well today, most of the failed ones stopped working correctly after the 3 years of warranty so I didn't make much effort to find out what is wrong - the main problem being instability under combined CPU and I/O load. Of course, when RAM and HDDs failed and were easy to recognize as causes, they were replaced as needed.

As I wrote earlier on this list, the main disadvantage of such SFFs is the lack of IPMI support. There is no serial console support in the BIOS, so changing BIOS settings is a pain. Power control can be achieved with a PDU, but I didn't choose this way because I knew that the nodes should be always up and I wouldn't have to press the power buttons too often ;-) Another thing to keep in mind is that, due to their tightness, they are quite sensitive to the external temperature - if the A/C fails, expect a sharp raise in internal temperature, so setting up monitoring, both environmental and for the builtin sensors, is recommended.

Good luck!

--
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8240, Fax: +49 6221 54 8850
E-mail: bogdan.coste...@iwr.uni-heidelberg.de
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to