> http://www.hpcwire.com/hpcwire/2013-02-28/utility_supercomputing_heats_up.html
well, it's HPC wire - I always assume their name is acknowledgement that their content is much like "HPC PR wire", often or mostly vendor-sponsored. call me ivory-tower, but this sort of thing: Cycle has seen at least two examples of real-world MPI applications that ran as much as 40 percent better on the Amazon EC2 cloud than on an internal kit that used QDR InfiniBand. really PISSES ME OFF. it's insulting to the reader. let's first assume it's not a lie - next we should ask "how can that be"? EC2 has a smallish amount of virt overhead and weak interconnect, so why would it be faster? AFAIKT, the only possible explanation is that the "internal kit" was just plain botched. or else they're comparing apples/oranges (say, different vintage cpu/ram, or the app was sensitive to the particular cache size, associativity, SSE level, etc.) in other words, these examples do not inform the topic of the article, which is about the viability of cloud/utility HPC. the article then concludes "well, you should try it (us) because it doesn't cost much". instead I say: yes, gather data and when it indicates your "kit" is botched, you should fix your kit. I have to add: I've almost never seen a non-fluff quote from IDC. the ones in this article are doozies. > that are only great in straight lines ;-) Another thing to think of is > total cost per unit of science. Given we can now exploit much larger people say a lot of weaselly things in the guise of TCO. I do not really understand why cloud/utility is not viewed with a lot more suspicion. AFAIKT, people's thinking gets incredibly sloppy in this area, and they start accepting articles of faith like "Economies of Scale". yes, there is no question that some things get cheaper at large scale. even if we model that as a monotonic increase in efficiency, it's highly nonlinear. 1. capital cost of hardware. 2. operating costs: power, cooling, rent, connectivity, licenses. 3. staff operating costs. big operations probably get some economy of large-scale HW purchases. but it's foolish to think this is giant: why would your HW vendor not want to maintain decent margins? power/cooling/rent are mostly strictly linear once you get past trivial clusters (say, few tens of racks). certainly there is some economy possible, but there's isn't much room to work with. since power is about 10% of purchase cost per year, mediocre PUE makes that 13%, and because we're talking cloud, rent is off the table. I know Google/FB/etc manage PUEs of near 1.0 and site their facilities to get better power prices. I suspect they do not get half-priced power, though. and besides, that's still only going to take the operating component of TCO down to 5%. at the rate cpu speed and power is improving, they probably care more about accelerated amortization. staff: box-monkeying is strictly linear with size of cluster, but can be extremely low. (do you even bother to replace broken stuff?). actual sysadmin/system programming is *not* a function of the size of the facility at all, or at least not directly. diversity of nodes and/or environments is what costs you system-person time. you can certainly model this, but it's not really part of the TCO examination, since you have to pay it either way. in short, scale matters, but not much. so in a very crude sense, cloud/utility computing is really just asking another company to make a profit from you. if you did it yourself, you'd simply not be making money for Amazon - everything else could be the same. Amazon has no special sauce, just a fairly large amount of DIN-standard ketchup. unless outsource-mania is really a reflection of doubts about competence: if we insource, we're vulnerable to having incompetent staff. the one place where cloud/utility outsourcing makes the most sense is at small scale. if you don't have enough work to keep tens of racks busy, then there are some scaling and granularity effects. you probably can't hire 3% of a sysadmin, and some of your nodes will be idle at times... I'm a little surprised there aren't more cloud cooperatives, where smaller companies pool their resources to form a non-profit entity to get past these dis-economies of very small scale. fundamentally, I think it's just that almost anyone thrust into a management position is phobic about risk. I certainly see that in the organization where I work (essentialy an academic HPC coop.) people really like EC2. that's great. but they shouldn't be deluded into thinking it's efficient: Amazon is making a KILLING on EC2. > systems than some of us have internally, are we are starting to see > overhead issues of vanish due to massive scale, certainly at cost? I know eh? numbers please. I see significant overheads only on quite small systems. > for a fact that what we call "Pleasantly Parallel" workloads all of this hah. people forget that "embarassingly parallel" is a reference to the "embarassment of riches" idiom. perhaps a truer name would be "independent concurrency". > I personally think the game is starting to change a little bit yet again > here... I don't. HPC clusters have been "PaaS cloud providers" from the beginning. outsourcing is really a statement about a organization's culture: an assertion that if it insources, it will do so badly. it's interesting that some very large organizations like GE are in the middle of a reaction from outsourcing... regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf