On 04/07/2014 04:40 PM, Ellis H. Wilson III wrote:
On 04/07/2014 03:44 PM, Prentice Bisbal wrote:
As long as you use enterprise-grade SSDs (e.g., Intel's stuff) with
overprovisioning, the nand endurance shouldn't be an issue over the
lifetime of a cluster.  We've used SSDs as our nodes' system disks for
a few years now (going on four with our oldest, 324-node production
system), and there have been no major problems.  The major problems
happened when we were using the cheaper commodity SSDs.  Don't give in
to the temptation to save a few pennies there.

Thanks for the info. Enterprise typically means MLC instead of SLC, right?

After reading the link below, I think I got my concept of SLC and MLC backwards. Sorry. Not an SSD expert by any stretch of the imagination.

There is a lot of cruft to filter through in the SSD space to understand what the hell is really going on. First of all, "enterprise" can really mean anything, but one thing is for certain: it is more expensive. Enterprise can mean the same material (flash cells) but a different (better) flash translation layer (wear-leveling/garbage collection/etc) or a different feature size (bigger generally means more reliable and less dense) or something fancier fab tech. Blog with more info on the topic here:

http://www.violin-memory.com/blog/flash-flavors-mlc-emlc-or-vmlc-part-i/

Thanks. This was an excellent read.

Either way, "more bits-per-cell" are generally seen as less enterprise than "fewer bits-per-cell." So, SLC is high-reliability enterprise, MLC can be enterprise in some cases (marketing has even taken it upon itself to brand some "eMLC" or enterprise MLC, which has about as much meaning as can be expected) and TLC is arguably just commodity. Less cells also means faster latencies, particularly for writes/erases.

I guess I disagree with the previous poster that saving by going the commodity route, which, by the way, is not pennies but often upwards of 50%, is always bad. It really depends on your situation/use-case. I wouldn't store permanent data on outright commodity SSDs, but as a LOCAL scratch-pad, they can be brilliant (and replacing them later may be far more advisable than spending a ton up front and praying they don't).

For instance, since you mention Hadoop, you are in a good situation to consider commodity SSDs since it will automatically failover to another node if one node's SSD is dead. It's not going to kill off your whole job. Hadoop is built to cope with that. This being said, I am not suggesting you necessarily should go the route of putting HDFS on SSDs. The bandwidth and capacity concerns you raise are spot on there. What I am suggesting is perhaps using a few commodity SSDs for your mapreduce.cluster.local.dir, or where your intermediate data will reside. You suggest "not many users will take advantage of this." If your core application is Hadoop, every user will take advantage of these SSDs (unless they explicitly override the tmp path, which is possible), and the gains can be significant over HDDs. Moreover, you aren't multiplexing persistent and temporary data onto/from your HDFS HDDs, so you can get speedups getting to persistent data as well since you've effectively created dedicated storage pools for both types of accesses. This can be important.
I don't know much about Hadoop, but this seems like a good middle-ground. I like technology to be transparent to users. If they have to do something specific on their end to improve performance, 9/10 times, they won't either because they're unaware it's an available feature, don't understand the impact of using it, or are just too lazy to change their habits.

Caveat #1: Make sure to determine how much temporary space will be needed, and acquire enough SSDs to cover that across the cluster. That, or, instruct your users that "jobs generating up to XTB of intermediate data can run on the SSDs, which is done by default, but for jobs exceeding that use these extra parameters to send the tmp data to HDDs." More complexity though. Depends on the user-base.
That could be a problem. Because really don't know the user habits yet, since this will be my first Hadoop resource, so I have no job-size data to go from. Also, see my previous comment.

Caveat #2: If you're building incredibly stacked boxes (e.g., 16+ HDDs) you may be resource-limited in a number of ways that makes adding SATA SSDs unwise. May not be worth the effort to squeeze more SSDs in there, or PCIe SSDs (tending to be more enterprise anyhow) might be the way to go.

Caveat #3: Only certain types of Hadoop jobs really hammer intermediate space. Read- and write-intensive jobs often won't, but those special ones that do (e.g., Sort) benefit by immense amounts with a fast intermediate space.
If that's the case, than your suggestion may not be worth it at this point, since I don't have any usage data to determine whether it's a wise investment.

Caveat #4: There are probably more caveats. My advice is to build a two mock-up machines with and without it and run a "baby cluster" Hadoop instance. This way, if SSDs really don't bring the performance gains you want, you avoid buying a bunch, wasting money, and time probably replacing them down the road.
This is the ideal approach. Unfortunately, I don't have the time or resources for this. :(

More on Wear-Out: This is becoming an issue again for /modern/ feature sizes and commodity bit-levels (especially TLC). For modern drives with last gen or two gen back feature sizes, fancy wear-leveling and egregious amounts of over-provisioning have more or less made wear-out impossible for the lifetime of your machine.

Best,

ellis


_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to