[...] commodity disks are plenty reliable
and are not a significant source of uptime problems.
C|N>K (i.e. coffee piped through nose into keyboard)
sorry!
That's not quite a general truth. 8^)
I mean that in the experience of my organization,
the mundane maxtor and seagate disks that we get
with our mostly HP hardware is extremely reliable.
surprisingly so - certainly we were expecting worse,
based on the published studies.
we have ~20 clusters online totalling >8k cores.
most nodes (2-4 cores/node) have 2 sata disks, which
have had a very low failure rate (probably < 1% afr over
2-3 years of service). in addition, we have four 70TB
storage clusters build from arrays of 9+2 raids of
commodity 250G sata disks, as well as a 200TB cluster
(10+2x500G disks iirc). failure rate of these disks
have been quite low as well (I'm guessing actually lower
than the in-node disks, even though the storage-cluster
disks are much more heavily used.)
here's my handwaving explanation of this: in-node disks are
hardly used, since they're just the OS, and nodes spend most
of their time running apps. disks in the storage clusters
are more heavily used, but even for a large cluster, we
simply do not generate enough load. (I'm not embarassed by
that - remember cheap disks sustain 50 MB/s these days, so
if you have a 70 TB Lustre filesystem, you'd have to sustain
10 GB/s to actually keep the disks busy. in other words,
bigger storage is generally less active...)
but maybe it makes sense not to fight the tide of disturbingly cheap
and dense storage. even a normal 1U cluster node could often be configured
with several TB of local storage. the question is: how to make use of it?
Some people are running dCache pools on their cluster nodes.
that's cool to know. how do users like it? performance comments?
thanks, mark hahn.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf