Re: [Beowulf] SSDs for HPC?

Prentice Bisbal Tue, 08 Apr 2014 09:06:03 -0700

On 04/07/2014 04:40 PM, Ellis H. Wilson III wrote:

On 04/07/2014 03:44 PM, Prentice Bisbal wrote:

As long as you use enterprise-grade SSDs (e.g., Intel's stuff) with
overprovisioning, the nand endurance shouldn't be an issue over the
lifetime of a cluster.  We've used SSDs as our nodes' system disks for
a few years now (going on four with our oldest, 324-node production
system), and there have been no major problems.  The major problems
happened when we were using the cheaper commodity SSDs.  Don't give in
to the temptation to save a few pennies there.

Thanks for the info. Enterprise typically means MLC instead of SLC,right?

After reading the link below, I think I got my concept of SLC and MLCbackwards. Sorry. Not an SSD expert by any stretch of the imagination.

There is a lot of cruft to filter through in the SSD space tounderstand what the hell is really going on. First of all,"enterprise" can really mean anything, but one thing is for certain:it is more expensive. Enterprise can mean the same material (flashcells) but a different (better) flash translation layer(wear-leveling/garbage collection/etc) or a different feature size(bigger generally means more reliable and less dense) or somethingfancier fab tech. Blog with more info on the topic here:
http://www.violin-memory.com/blog/flash-flavors-mlc-emlc-or-vmlc-part-i/


Thanks. This was an excellent read.

Either way, "more bits-per-cell" are generally seen as less enterprisethan "fewer bits-per-cell." So, SLC is high-reliability enterprise,MLC can be enterprise in some cases (marketing has even taken it uponitself to brand some "eMLC" or enterprise MLC, which has about as muchmeaning as can be expected) and TLC is arguably just commodity. Lesscells also means faster latencies, particularly for writes/erases.
I guess I disagree with the previous poster that saving by going thecommodity route, which, by the way, is not pennies but often upwardsof 50%, is always bad. It really depends on your situation/use-case.I wouldn't store permanent data on outright commodity SSDs, but as aLOCAL scratch-pad, they can be brilliant (and replacing them later maybe far more advisable than spending a ton up front and praying theydon't).
For instance, since you mention Hadoop, you are in a good situation toconsider commodity SSDs since it will automatically failover toanother node if one node's SSD is dead. It's not going to kill offyour whole job. Hadoop is built to cope with that. This being said,I am not suggesting you necessarily should go the route of puttingHDFS on SSDs. The bandwidth and capacity concerns you raise are spoton there. What I am suggesting is perhaps using a few commodity SSDsfor your mapreduce.cluster.local.dir, or where your intermediate datawill reside. You suggest "not many users will take advantage ofthis." If your core application is Hadoop, every user will takeadvantage of these SSDs (unless they explicitly override the tmp path,which is possible), and the gains can be significant over HDDs.Moreover, you aren't multiplexing persistent and temporary dataonto/from your HDFS HDDs, so you can get speedups getting topersistent data as well since you've effectively created dedicatedstorage pools for both types of accesses. This can be important.

I don't know much about Hadoop, but this seems like a goodmiddle-ground. I like technology to be transparent to users. If theyhave to do something specific on their end to improve performance, 9/10times, they won't either because they're unaware it's an availablefeature, don't understand the impact of using it, or are just too lazyto change their habits.

Caveat #1: Make sure to determine how much temporary space will beneeded, and acquire enough SSDs to cover that across the cluster.That, or, instruct your users that "jobs generating up to XTB ofintermediate data can run on the SSDs, which is done by default, butfor jobs exceeding that use these extra parameters to send the tmpdata to HDDs." More complexity though. Depends on the user-base.

That could be a problem. Because really don't know the user habits yet,since this will be my first Hadoop resource, so I have no job-size datato go from. Also, see my previous comment.

Caveat #2: If you're building incredibly stacked boxes (e.g., 16+HDDs) you may be resource-limited in a number of ways that makesadding SATA SSDs unwise. May not be worth the effort to squeeze moreSSDs in there, or PCIe SSDs (tending to be more enterprise anyhow)might be the way to go.
Caveat #3: Only certain types of Hadoop jobs really hammerintermediate space. Read- and write-intensive jobs often won't, butthose special ones that do (e.g., Sort) benefit by immense amountswith a fast intermediate space.

If that's the case, than your suggestion may not be worth it at thispoint, since I don't have any usage data to determine whether it's awise investment.

Caveat #4: There are probably more caveats. My advice is to build atwo mock-up machines with and without it and run a "baby cluster"Hadoop instance. This way, if SSDs really don't bring the performancegains you want, you avoid buying a bunch, wasting money, and timeprobably replacing them down the road.

This is the ideal approach. Unfortunately, I don't have the time orresources for this. :(

More on Wear-Out: This is becoming an issue again for /modern/ featuresizes and commodity bit-levels (especially TLC). For modern driveswith last gen or two gen back feature sizes, fancy wear-leveling andegregious amounts of over-provisioning have more or less made wear-outimpossible for the lifetime of your machine.
Best,

ellis


_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] SSDs for HPC?

Reply via email to