Out of interest, how large are the compute jobs (memory, runtime etc)? How easy to get them to fit into a serverless environment?
Thanks, Guy On Tue, 21 Sept 2021 at 13:02, Tim Cutts <t...@sanger.ac.uk> wrote: > I think that’s exactly the situation we’ve been in for a long time, > especially in life sciences, and it’s becoming more entrenched. My > experience is that the average user of our scientific computing systems has > been becoming less technically savvy for many years now. > > The presence of the cloud makes that more acute, in particular because it > makes it easy for the user to effectively throw more hardware at the > problem, which reduces the incentive to make their code particularly fast > or efficient. Cost is the only brake on it, and in many cases I’m finding > the PI doesn’t actually care about that. They care that a result is being > obtained (and it’s time to first result they care about, not time to > complete all the analysis), and so they typically don’t have much time for > those of us who are telling them they need to invest in time up front > developing and optimising efficient code. > > And cost is not necessarily the brake I thought it was going to be > anyway. One recent project we’ve done on AWS has impressed me a great > deal. It’s not terribly CPU efficient, and would doubtless, with > sufficient effort, run much more efficiently on premise. But it’s > extremely elastic in its nature, and so a good fit for the cloud. Once a > week, the project has to completely re-analyse the 600,000+ COVID genomes > we’e sequenced so far, looking for new branches in the phylogenetic tree, > and to complete that analysis inside 8 hours. Initial attempts to naively > convert the HPC implementation to run on AWS looked as though they were > going to be very expensive (~$50k per weekly run). But a fundamental > reworking of the entire workflow to make it as cloud native as possible, by > which I mean almost exclusively serverless, has succeeded beyond what I > expected. The total cost is <$5,000 a month, and because there is > essentially no statically configured infrastructure at all, the security is > fairly easy to be comfortable about. And all of that was done with no > detailed thinking about whether the actual algorithms running in the > containers are at all optimised in a traditional HPC sense. It’s just not > needed for this particular piece of work. Did it need software developers > with hardcore knowledge of performance optimisation? No. Was it rapid to > develop and deploy? Yes. Is the performance fast enough for UK national > COVID variant surveillance? Yes. Is it cost effective? Yes. Sold! The > one thing it did need was knowledgeable cloud architects, but the cloud > providers can and do help with that. > > Tim > > -- > Tim Cutts > Head of Scientific Computing > Wellcome Sanger Institute > > > On 21 Sep 2021, at 12:24, John Hearns <hear...@gmail.com> wrote: > > Some points well made here. I have seen in the past job scripts passed on > from graduate student to graduate student - the case I am thinking on was > an Abaqus script for 8 core systems, being run on a new 32 core system. Why > WOULD a graduate student question a script given to them - which works. > They should be getting on with their science. I guess this is where > Research Software Engineers come in. > > Another point I would make is about modern processor architectures, for > instance AMD Rome/Milan. You can have different Numa Per Socket options, > which affect performance. We set the preferred IO path - which I have seen > myself to have an effect on latency of MPI messages. IF you are not > concerned about your hardware layout you would just go ahead and run, > missing a lot of performance. > > I am now going to be controversial and common that over in Julia land the > pattern seems to be these days people develop on their own laptops, or > maybe local GPU systems. There is a lot of microbenchmarking going on. But > there seems to be not a lot of thought given to CPU pinning or shat happens > with hyperthreading. I guess topics like that are part of HPC 'Black Magic' > - though I would imagine the low latency crowd are hot on them. > > I often introduce people to the excellent lstopo/hwloc utilities which > show the layout of a system. Most people are pleasantly surprised to find > this. > > > -- The Wellcome Sanger Institute is operated by Genome Research Limited, a > charity registered in England with number 1021457 and a company registered > in England with number 2742969, whose registered office is 215 Euston Road, > London, NW1 2BE. > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > -- Dr. Guy Coates +44(0)7801 710224
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf