Vincent, seer of seers, prognosticator of prognosticators, Grendel of Grendels, answer me this (ironically, this will get us back to the OP's question, in a form at least, which would be just swell):
"You are charged with creating the most efficient and cost-effective cache layer possible using retail pricing and commodity components. Let us assume that, in order to keep this conversation about bandwidth going, this cache is geared to perform well for highly sequential read workloads. Further, since (naturally) not all data is hot at the same time, given the imaginary and unspecified size of this parallel file system, we know that on any given week there's only about 250GB of data that is really, seriously utilized over and over and therefore good for caching. However, this needs to be fed fast -- 3GB/s for instance. Ignore the network; this can be one monolithic PFS with local cache. Yes, this diverges from the OP's question, but few took that seriously anyhow." Summary: What's the cheapest, fastest, read-bandwidth optimized caching medium for a single machine, serving hot data of about 0.25TB at ideally 3GB/s to keep the machine busy? Examination (I'm using tomshardware for quick performance numbers and pricing figures -- drop the non-retail convo Vince, nobody is buying it): Rough (for reads) MB/s/$ of HDD (>=250GB): Ranges from 1 to 2.5 Rough (for reads) MB/s/$ of SSD (>=250GB): Ranges from 1 to 2.9 Let's consider real examples at the top of my quick perusal for each category. I'll use for the SSD, the Samsung 840, costing about 178 and delivering about 520MB/s, giving it around ~2.9MB/s/$. For the HDD, I'll use the Toshiba DT01ACA100, costing around $73 and delivering about 185MB/s, giving it 2.5MB/s/$. So, with my best HDD, I'll need about 16 of them to deliver the 3GB/s figure I want, which will cost me in aggregate $1168. For my best SSD, I'll need only about 6 of them, which will cost me $1068. So this discussion about SSDs being pointless for bandwidth should (hopefully) be over. They can be used for bandwidth acceleration, particularly (as the OP mentioned) if used on the compute node when a weak network link sits between it and the PFS. In those cases, there is rarely space enough to shove all the HDDs Vincent is espousing in there, and therefore SSDs are ideal solutions whether you want a cache for latency or bandwidth. If, on the other hand, we are talking about building general filers with huge capacity, minimized cost, and sequential workloads, of course HDDs rock. But we aren't/weren't/never have been talking about that. Best, ellis _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf