Doug, Thanks for posting that video. It confirmed what I always suspected about clouds for HPC.
Prentice On 10/03/2011 08:25 AM, Douglas Eadline wrote: > Interesting and pragmatic HPC cloud presentation, worth watching > (25 minutes) > > http://insidehpc.com/2011/09/30/video-the-real-future-of-cloud-computing/ > > -- > Doug > >> >> http://arstechnica.com/business/news/2011/09/30000-core-cluster-built-on-amazon-ec2-cloud.ars >> >> $1,279-per-hour, 30,000-core cluster built on Amazon EC2 cloud >> >> By Jon Brodkin | Published September 20, 2011 10:49 AM >> >> Amazon EC2 and other cloud services are expanding the market for >> high-performance computing. Without access to a national lab or a >> supercomputer in your own data center, cloud computing lets businesses >> spin >> up temporary clusters at will and stop paying for them as soon as the >> computing needs are met. >> >> A vendor called Cycle Computing is on a mission to demonstrate the >> potential >> of Amazon’s cloud by building increasingly large clusters on the Elastic >> Compute Cloud. Even with Amazon, building a cluster takes some work, but >> Cycle combines several technologies to ease the process and recently used >> them to create a 30,000-core cluster running CentOS Linux. >> >> The cluster, announced publicly this week, was created for an unnamed >> “Top 5 >> Pharma� customer, and ran for about seven hours at the end of July at a >> peak >> cost of $1,279 per hour, including the fees to Amazon and Cycle Computing. >> The details are impressive: 3,809 compute instances, each with eight cores >> and 7GB of RAM, for a total of 30,472 cores, 26.7TB of RAM and 2PB >> (petabytes) of disk space. Security was ensured with HTTPS, SSH and >> 256-bit >> AES encryption, and the cluster ran across data centers in three Amazon >> regions in the United States and Europe. The cluster was dubbed >> “Nekomata.� >> >> Spreading the cluster across multiple continents was done partly for >> disaster >> recovery purposes, and also to guarantee that 30,000 cores could be >> provisioned. “We thought it would improve our probability of success if >> we >> spread it out,� Cycle Computing’s Dave Powers, manager of product >> engineering, told Ars. “Nobody really knows how many instances you can >> get at >> any one time from any one [Amazon] region.� >> >> Amazon offers its own special cluster compute instances, at a higher cost >> than regular-sized virtual machines. These cluster instances provide 10 >> Gigabit Ethernet networking along with greater CPU and memory, but they >> weren’t necessary to build the Cycle Computing cluster. >> >> The pharmaceutical company’s job, related to molecular modeling, was >> “embarrassingly parallel� so a fast interconnect wasn’t crucial. To >> further >> reduce costs, Cycle took advantage of Amazon’s low-price “spot >> instances.� To >> manage the cluster, Cycle Computing used its own management software as >> well >> as the Condor High-Throughput Computing software and Chef, an open source >> systems integration framework. >> >> Cycle demonstrated the power of the Amazon cloud earlier this year with a >> 10,000-core cluster built for a smaller pharma firm called Genentech. Now, >> 10,000 cores is a relatively easy task, says Powers. “We think we’ve >> mastered >> the small-scale environments,� he said. 30,000 cores isn’t the end >> game, >> either. Going forward, Cycle plans bigger, more complicated clusters, >> perhaps >> ones that will require Amazon’s special cluster compute instances. >> >> The 30,000-core cluster may or may not be the biggest one run on EC2. >> Amazon >> isn’t saying. >> >> “I can’t share specific customer details, but can tell you that we do >> have >> businesses of all sizes running large-scale, high-performance computing >> workloads on AWS [Amazon Web Services], including distributed clusters >> like >> the Cycle Computing 30,000 core cluster to tightly-coupled clusters often >> used for science and engineering applications such as computational fluid >> dynamics and molecular dynamics simulation,� an Amazon spokesperson told >> Ars. >> >> Amazon itself actually built a supercomputer on its own cloud that made it >> onto the list of the world’s Top 500 supercomputers. With 7,000 cores, >> the >> Amazon cluster ranked number 232 in the world last November with speeds of >> 41.82 teraflops, falling to number 451 in June of this year. So far, Cycle >> Computing hasn’t run the Linpack benchmark to determine the speed of its >> clusters relative to Top 500 sites. >> >> But Cycle’s work is impressive no matter how you measure it. The job >> performed for the unnamed pharma company “would take well over a week >> for >> them to run internally,� Powers says. In the end, the cluster performed >> the >> equivalent of 10.9 “compute years of work.� >> >> The task of managing such large cloud-based clusters forced Cycle to step >> up >> its own game, with a new plug-in for Chef the company calls Grill. >> >> “There is no way that any mere human could keep track of all of the >> moving >> parts on a cluster of this scale,� Cycle wrote in a blog post. “At >> Cycle, >> we’ve always been fans of extreme IT automation, but we needed to take >> this >> to the next level in order to monitor and manage every instance, volume, >> daemon, job, and so on in order for Nekomata to be an efficient 30,000 >> core >> tool instead of a big shiny on-demand paperweight.� >> >> But problems did arise during the 30,000-core run. >> >> “You can be sure that when you run at massive scale, you are bound to >> run >> into some unexpected gotchas,� Cycle notes. “In our case, one of the >> gotchas >> included such things as running out of file descriptors on the license >> server. In hindsight, we should have anticipated this would be an issue, >> but >> we didn’t find that in our prelaunch testing, because we didn’t test >> at full >> scale. We were able to quickly recover from this bump and keep moving >> along >> with the workload with minimal impact. The license server was able to keep >> up >> very nicely with this workload once we increased the number of file >> descriptors.� >> >> Cycle also hit a speed bump related to volume and byte limits on >> Amazon’s >> Elastic Block Store volumes. But the company is already planning bigger >> and >> better things. >> >> “We already have our next use-case identified and will be turning up the >> scale a bit more with the next run,� the company says. But ultimately, >> “it’s >> not about core counts or terabytes of RAM or petabytes of data. Rather, >> it’s >> about how we are helping to transform how science is done.� >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> -- >> This message has been scanned for viruses and >> dangerous content by MailScanner, and is >> believed to be clean. >> >> > >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf