Re: [Beowulf] distributing storage amongst compute nodes

Mark Hahn Mon, 22 Oct 2007 08:33:01 -0700

[...] commodity disks are plenty reliable
and are not a significant source of uptime problems.


C|N>K (i.e. coffee piped through nose into keyboard)


sorry!

That's not quite a general truth. 8^)


I mean that in the experience of my organization,

the mundane maxtor and seagate disks that we getwith our mostly HP hardware is extremely reliable.

surprisingly so - certainly we were expecting worse,
based on the published studies.

we have ~20 clusters online totalling >8k cores.

most nodes (2-4 cores/node) have 2 sata disks, whichhave had a very low failure rate (probably < 1% afr over2-3 years of service). in addition, we have four 70TBstorage clusters build from arrays of 9+2 raids ofcommodity 250G sata disks, as well as a 200TB cluster

(10+2x500G disks iirc).  failure rate of these disks
have been quite low as well (I'm guessing actually lower

than the in-node disks, even though the storage-clusterdisks are much more heavily used.)

here's my handwaving explanation of this: in-node disks arehardly used, since they're just the OS, and nodes spend mostof their time running apps. disks in the storage clustersare more heavily used, but even for a large cluster, wesimply do not generate enough load. (I'm not embarassed bythat - remember cheap disks sustain 50 MB/s these days, soif you have a 70 TB Lustre filesystem, you'd have to sustain

10 GB/s to actually keep the disks busy.  in other words,

bigger storage is generally less active...)

but maybe it makes sense not to fight the tide of disturbingly cheap
and dense storage.  even a normal 1U cluster node could often be configured
with several TB of local storage.  the question is: how to make use of it?


Some people are running dCache pools on their cluster nodes.


that's cool to know.  how do users like it?  performance comments?

thanks, mark hahn.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] distributing storage amongst compute nodes

Reply via email to