On 10/13/20 3:12 PM, Oddo Da wrote:
Jim, Peter: by things have not changed in the tooling I meant that it is
the same approach/paradigm as it was when I was in HPC back in the late
1990s/early 2000s. Even if you look at books about OpenMPI, you can go
on their mailing list and ask what books to read and you will be pointed
to the same stuff published 20+ years ago and maybe there are one or two
books that are "fresher" than that (I did that a few months ago, naively
thinking that things have changed ;-) ).
The approach is still the same - you have to write the code at the low
level and worry about everything. It would be nice if this was improved
and things were abstracted up and away a bit. The appearance of Spark,
for example, did exactly that for data science/machine learning/"big
data" - esp. when you write it in Scala (functional programming) - it
just makes for all sorts of cleaner, abstracted, more correct code where
the framework worries about the underlying data/computation locality,
the communication between all the machinery etc. etc. and you are left
to worry about the problem you are solving. I just feel that in the HPC
world we have not moved to this point yet and am trying to understand why.
I mean, let's say I was a data science researcher at a university and
all that was on offer was the traditional HPC cluster - what tooling
would I use to do my research? The whole world is doing something else
but I am stuck worrying about the low level details.... or I need to ask
for a separate HDFS/Spark cluster? What if I want to stream data from
somewhere like it is done commonly in the industry (solutions like Kafka
etc.) - my only option is to stand up a local cluster (costs time,
money, ongoing admin/maintenance) or to go to AWS or Azure and spend tax
payer money to fill corporate coffers for what should a;ready be a
solved problem with the money that was spent for all the hardware at the
University already?
BTW, Spark is just an example of how tooling/methodologies have improved
in the industry in the domain of distributed computation. This is why I
thought that Julia may be one of those things that provides a different
(improved?) way of doing things where both the climate modeling guys and
the data science guys can utilize the same HPC hardware....
A number of countries have national infrastructures. Small and moderate
allocations on XSEDE or similar allow people some experience with HPC
without their institution investing in significant computational
resources. The problem is usually, that knowledge on using these
resources may then be scarce at an institution without any HPC resources.
A typical university cluster can run a data science workload with Spark,
Hadoop etc., just requires Admins to make this possible. Systems like
Comet are made for this kind of work:
https://portal.xsede.org/sdsc-comet
Once jobs start using tens to hundreds of thousands of cores hours,
taxpayer money (probably also the environment) is saved by writing in a
low level language.
A small number of countries design entirely new systems and train their
students write/port software for them - much as happened with bleeding
edge systems 20 years ago:)
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf