PS: be sure to use the 'mcelog' utility and package to monitor for ECC
errors. If you have a large number of nodes this will help to identify
flaky memory and cpus with cache memory issues.
On Mon, 4 Sep 2006, stephen mulcahy wrote:
Hi Bruce,
Do you have any idea what the performance impact from enabling scrubbing
is on your systems? did you do any before/after benchmarking?
Thanks,
-stephen
Bruce Allen wrote:
On Sun, 3 Sep 2006, Mark Hahn wrote:
ECC Features
ECC Enabled
ECC Scrub Redirection Enabled
Dram ECC Scrub CTL Disabled
Chip-Kill Disabled
DCACHE ECC Scrub CTL Disabled
L2 ECC Scrub CTL Disabled
You can find our systems BIOS/ECC/Scrub settings here:
http://www.lsc-group.phys.uwm.edu/beowulf/nemo/construction/BIOS/bios_settings.txt
Our systems are Supermicro H8SSL-i motherboards, with a
Serverworks/Broadcom HT1000 chipset and a single Opteron 175 (dual core,
2.2 GHz).
The ECC part is:
DRAM ECC Enable = Enabled
MCA DRAM ECC Logging = Enabled
DRAM Scrub Redirect = Enabled
DRAM BG Scrub = 2.62ms
L2 Cache BG Scrub = 84.00ms
Data Cache BG Scrub = 84.00ms
Scrubbing is done one cache line (64) bytes at a time. Thus with 2GB of
memory and DRAM background scrub interval of 2.62ms we will scrub the
entire memory in approximately:
2 GB/64 Bytes * 2.62 ms = 2^31 / 2^6 * 2.62 ms = 87912 secs
So our choices correspond to one complete scrub of DRAM per day. Our
settings scrub the L2 cache more often: about once every half hour.
Just modify the calculation above, using 1MB instead of 2GB, and 84 ms
instead of 2.62 ms. One finds that the L2 cache is scrubbed about once
every 1376 seconds (every 23 minutes).
Cheers,
Bruce
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf