Interesting and pragmatic HPC cloud presentation, worth watching (25 minutes)
http://insidehpc.com/2011/09/30/video-the-real-future-of-cloud-computing/ -- Doug > > http://arstechnica.com/business/news/2011/09/30000-core-cluster-built-on-amazon-ec2-cloud.ars > > $1,279-per-hour, 30,000-core cluster built on Amazon EC2 cloud > > By Jon Brodkin | Published September 20, 2011 10:49 AM > > Amazon EC2 and other cloud services are expanding the market for > high-performance computing. Without access to a national lab or a > supercomputer in your own data center, cloud computing lets businesses > spin > up temporary clusters at will and stop paying for them as soon as the > computing needs are met. > > A vendor called Cycle Computing is on a mission to demonstrate the > potential > of Amazonâs cloud by building increasingly large clusters on the Elastic > Compute Cloud. Even with Amazon, building a cluster takes some work, but > Cycle combines several technologies to ease the process and recently used > them to create a 30,000-core cluster running CentOS Linux. > > The cluster, announced publicly this week, was created for an unnamed > âTop 5 > Pharmaâ customer, and ran for about seven hours at the end of July at a > peak > cost of $1,279 per hour, including the fees to Amazon and Cycle Computing. > The details are impressive: 3,809 compute instances, each with eight cores > and 7GB of RAM, for a total of 30,472 cores, 26.7TB of RAM and 2PB > (petabytes) of disk space. Security was ensured with HTTPS, SSH and > 256-bit > AES encryption, and the cluster ran across data centers in three Amazon > regions in the United States and Europe. The cluster was dubbed > âNekomata.â > > Spreading the cluster across multiple continents was done partly for > disaster > recovery purposes, and also to guarantee that 30,000 cores could be > provisioned. âWe thought it would improve our probability of success if > we > spread it out,â Cycle Computingâs Dave Powers, manager of product > engineering, told Ars. âNobody really knows how many instances you can > get at > any one time from any one [Amazon] region.â > > Amazon offers its own special cluster compute instances, at a higher cost > than regular-sized virtual machines. These cluster instances provide 10 > Gigabit Ethernet networking along with greater CPU and memory, but they > werenât necessary to build the Cycle Computing cluster. > > The pharmaceutical companyâs job, related to molecular modeling, was > âembarrassingly parallelâ so a fast interconnect wasnât crucial. To > further > reduce costs, Cycle took advantage of Amazonâs low-price âspot > instances.â To > manage the cluster, Cycle Computing used its own management software as > well > as the Condor High-Throughput Computing software and Chef, an open source > systems integration framework. > > Cycle demonstrated the power of the Amazon cloud earlier this year with a > 10,000-core cluster built for a smaller pharma firm called Genentech. Now, > 10,000 cores is a relatively easy task, says Powers. âWe think weâve > mastered > the small-scale environments,â he said. 30,000 cores isnât the end > game, > either. Going forward, Cycle plans bigger, more complicated clusters, > perhaps > ones that will require Amazonâs special cluster compute instances. > > The 30,000-core cluster may or may not be the biggest one run on EC2. > Amazon > isnât saying. > > âI canât share specific customer details, but can tell you that we do > have > businesses of all sizes running large-scale, high-performance computing > workloads on AWS [Amazon Web Services], including distributed clusters > like > the Cycle Computing 30,000 core cluster to tightly-coupled clusters often > used for science and engineering applications such as computational fluid > dynamics and molecular dynamics simulation,â an Amazon spokesperson told > Ars. > > Amazon itself actually built a supercomputer on its own cloud that made it > onto the list of the worldâs Top 500 supercomputers. With 7,000 cores, > the > Amazon cluster ranked number 232 in the world last November with speeds of > 41.82 teraflops, falling to number 451 in June of this year. So far, Cycle > Computing hasnât run the Linpack benchmark to determine the speed of its > clusters relative to Top 500 sites. > > But Cycleâs work is impressive no matter how you measure it. The job > performed for the unnamed pharma company âwould take well over a week > for > them to run internally,â Powers says. In the end, the cluster performed > the > equivalent of 10.9 âcompute years of work.â > > The task of managing such large cloud-based clusters forced Cycle to step > up > its own game, with a new plug-in for Chef the company calls Grill. > > âThere is no way that any mere human could keep track of all of the > moving > parts on a cluster of this scale,â Cycle wrote in a blog post. âAt > Cycle, > weâve always been fans of extreme IT automation, but we needed to take > this > to the next level in order to monitor and manage every instance, volume, > daemon, job, and so on in order for Nekomata to be an efficient 30,000 > core > tool instead of a big shiny on-demand paperweight.â > > But problems did arise during the 30,000-core run. > > âYou can be sure that when you run at massive scale, you are bound to > run > into some unexpected gotchas,â Cycle notes. âIn our case, one of the > gotchas > included such things as running out of file descriptors on the license > server. In hindsight, we should have anticipated this would be an issue, > but > we didnât find that in our prelaunch testing, because we didnât test > at full > scale. We were able to quickly recover from this bump and keep moving > along > with the workload with minimal impact. The license server was able to keep > up > very nicely with this workload once we increased the number of file > descriptors.â > > Cycle also hit a speed bump related to volume and byte limits on > Amazonâs > Elastic Block Store volumes. But the company is already planning bigger > and > better things. > > âWe already have our next use-case identified and will be turning up the > scale a bit more with the next run,â the company says. But ultimately, > âitâs > not about core counts or terabytes of RAM or petabytes of data. Rather, > itâs > about how we are helping to transform how science is done.â > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf