Hi, Thanks to everyone who responded to my queries. I've tried to summarise the responses below for other's reference. Hope this is useful.
For BIOS memory settings, may want to disable "Node Memory Interleave". It may decrease memory bandwidth and noticeably increase memory latency (this is supported by the measurements in http://www.digit-life.com/articles2/cpu/rmma-numa.html). With K8SRE board in particular, there may be issues with Linux Broadcom driver in kernels > 2.6.5 which could cause stability problems at high load. If problems are seen, may want to use either 2.6.4 or 2.6.16+ Similarly, there are known issues with nforce4 chipset which may cause NFS errors or K8SRE shutdowns. May need an NFS patch if these problems occur. Enabling ECC Scrubbing (for both cache and DRAM) using the highest scrub times (normally 84ms) should not have a significant performance impact (note that using scrubbing with the lowest times/highest frequency may impact performance) and should make for a slightly more reliable system. Enabling Chipkill should also increase memory reliability without any performance impact and is recommended. It is recommended to use the mcelog package so that any memory errors are recorded at the operating system level. Thanks to: Alex Ninaber Bruce Allen Mark Hahn Eric W. Biederman -stephen -- Stephen Mulcahy, Applepie Solutions Ltd, Innovation in Business Center, GMIT, Dublin Rd, Galway, Ireland. http://www.aplpi.com _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf