On Fri, Jul 22, 2011 at 12:33:37AM -0400, Mark Hahn wrote: > storage isn't about performance any more. ok, hyperbole, a little. > but even a cheap disk does > 100 MB/s, and in all honesty, there are > not tons of people looking for bandwidth more than a small multiplier > of that. sure, a QDR fileserver wants more than a couple disks, > and if you're an iops-head, you're going flash anyway.
Over in the big data world, we're all about disk bandwidth, because we take the computation to the data. When we're reading something for a Map/Reduce job, we can easily drive 800 MB/s off of 8 disks in a single node, and for many jobs the most expensive thing about the job is reading. Good thing we have 3 copies of every bit of data, that gives us 1/3 the runtime. Writing, not so happy. Network bandwidth is a lot more expensive than disk bandwidth. Some data manipulations in HPC are like Map/Reduce. For example, shooting a movie using saved state files is embarrassingly parallel. The first system I heard about which took computation to the data was from SDSC, long before GOOG was founded. -- greg _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf