I think that’s exactly the situation we’ve been in for a long time, especially
in life sciences, and it’s becoming more entrenched. My experience is that the
average user of our scientific computing systems has been becoming less
technically savvy for many years now.
The presence of the cloud makes that more acute, in particular because it makes
it easy for the user to effectively throw more hardware at the problem, which
reduces the incentive to make their code particularly fast or efficient. Cost
is the only brake on it, and in many cases I’m finding the PI doesn’t actually
care about that. They care that a result is being obtained (and it’s time to
first result they care about, not time to complete all the analysis), and so
they typically don’t have much time for those of us who are telling them they
need to invest in time up front developing and optimising efficient code.
And cost is not necessarily the brake I thought it was going to be anyway. One
recent project we’ve done on AWS has impressed me a great deal. It’s not
terribly CPU efficient, and would doubtless, with sufficient effort, run much
more efficiently on premise. But it’s extremely elastic in its nature, and so
a good fit for the cloud. Once a week, the project has to completely
re-analyse the 600,000+ COVID genomes we’e sequenced so far, looking for new
branches in the phylogenetic tree, and to complete that analysis inside 8
hours. Initial attempts to naively convert the HPC implementation to run on
AWS looked as though they were going to be very expensive (~$50k per weekly
run). But a fundamental reworking of the entire workflow to make it as cloud
native as possible, by which I mean almost exclusively serverless, has
succeeded beyond what I expected. The total cost is <$5,000 a month, and
because there is essentially no statically configured infrastructure at all,
the security is fairly easy to be comfortable about. And all of that was done
with no detailed thinking about whether the actual algorithms running in the
containers are at all optimised in a traditional HPC sense. It’s just not
needed for this particular piece of work. Did it need software developers with
hardcore knowledge of performance optimisation? No. Was it rapid to develop
and deploy? Yes. Is the performance fast enough for UK national COVID variant
surveillance? Yes. Is it cost effective? Yes. Sold! The one thing it did
need was knowledgeable cloud architects, but the cloud providers can and do
help with that.
Tim
--
Tim Cutts
Head of Scientific Computing
Wellcome Sanger Institute
On 21 Sep 2021, at 12:24, John Hearns
<hear...@gmail.com<mailto:hear...@gmail.com>> wrote:
Some points well made here. I have seen in the past job scripts passed on from
graduate student to graduate student - the case I am thinking on was an Abaqus
script for 8 core systems, being run on a new 32 core system. Why WOULD a
graduate student question a script given to them - which works. They should be
getting on with their science. I guess this is where Research Software
Engineers come in.
Another point I would make is about modern processor architectures, for
instance AMD Rome/Milan. You can have different Numa Per Socket options, which
affect performance. We set the preferred IO path - which I have seen myself to
have an effect on latency of MPI messages. IF you are not concerned about your
hardware layout you would just go ahead and run, missing a lot of performance.
I am now going to be controversial and common that over in Julia land the
pattern seems to be these days people develop on their own laptops, or maybe
local GPU systems. There is a lot of microbenchmarking going on. But there
seems to be not a lot of thought given to CPU pinning or shat happens with
hyperthreading. I guess topics like that are part of HPC 'Black Magic' - though
I would imagine the low latency crowd are hot on them.
I often introduce people to the excellent lstopo/hwloc utilities which show the
layout of a system. Most people are pleasantly surprised to find this.
--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf