On 04/15/2013 02:21 PM, Prentice Bisbal wrote: > "High performance computing (HPC) is a form of computer usage where > utlilization one of the computer subsystems (processor, ram, disk, > network, etc), is at or near 100% capacity for extended periods of time."
It would be helpful to me if we could clarify what the end goal here is to defining "HPC." Scientists classify things when it is helpful to do so, but since the mantra of our science is "it depends," and we need to hear about all of your goals/details/code/etc anyhow, it seems like doing so makes trying to then put you back in a general category moot. It is not as if after hearing those goals we'd be like, "Oh here, you're clearly trying to solve an HPC problem, so take this push-button HPC solution!" Not that simple I fear, so I wonder about the utility of drawing imaginary lines in the sand, unless this is the Beo-marketeering list. In which case, please let me know so I can unsubscribe now :D. Getting back to the article, I am particularly troubled a number of seemingly obvious issues (at least to me, but I could be very wrong), when comparing cloud costs to purchasing one's own machines: "Over three years [to purchase and run your own servers], the total is US$ 2,096,000. On the other hand, using cloud computing via Cycle Computing...over the three years, the price is about US$ 974,025. Cloud computing works out to half the cost of a dedicated system for these workloads." Issue #1: This is my biggest issue. Where in the world is there just ONE, isolated researcher with a budget for three years of a million dollars? Find another researcher to split a cluster with and match costs. Or four, and do it for half the price of Cycle Computing. Or just buy 5 times less compute, and wait for 600 seconds instead of 120, or for an hour and 15 minutes instead of just 15 minutes, to use both of the examples provided. Along these lines, I am not buying the argument that some researcher out there has a completely EP problem (basically a set of scripts) and is blaming the scheduler for having to wait to concurrently run on 50k or 100k cores. That's his or her own fault. Just break your problem into many jobs (50k or 100k separate jobs would be fine), and so long as the machine isn't busy with somebody else's jobs, your scheduler isn't broken, or you haven't burnt up your credits in your institutions scheduling policies, you should proceed far quicker than having to wait until all of the giant cluster is empty for your pointlessly huge job to run a bunch of totally discrete tasks. Maybe someone can clarify something I'm missing about WHY these tasks need to run at exactly the same time?? What's wrong with your job running NOT concurrently over 2 or 3 hours? You're going to wait that long to get a set of instances that large anyhow. And if a bunch of million-dollar-toting researchers constantly find themselves maxing out the cluster and waiting too long, just spend that cash on owned compute to expand it. That's a good problem to have (having money and spending it on stuff you will own after the day is over). Issue #2: I think this is a too "black and white" evaluation of owned compute versus rented compute. Probably, what you really want, is to own some static amount of compute that will be saturated a lot of the time, and then rent out compute for big bursts (i.e. before conferences or some other research push). Ideally, there should be some kind of scheduling mechanism (maybe there is already, please share if you know it) that can allow you to transparently expand your private cloud with rented public cloud for those bursts, and research can go on with the exact same commands and job scheduling expectations. Maybe a tad slower on the public cloud machines, but nevertheless it will go on. Best, ellis _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf