Re: [Beowulf] [External] Spark, Julia, OpenMPI etc. - all in one place

Prentice Bisbal via Beowulf Mon, 12 Oct 2020 12:19:52 -0700

I'm not an expert on Big Data at all, but I hear the phrase "Hadoop"less and less these days. Where I work, most data analysts are using R,Python, or Spark in the form of PySpark. For machine learning, most ofthe researchers I support are using Python tools like TensorFlow orPyTorch.

I don't know much about Julia replacing MPI, etc., but I wish I did. Iwould like to know more about Julia.


Prentice

On 10/12/20 12:14 PM, Oddo Da wrote:

Hello,
I used to be in HPC back when we built beowulf clusters by hand ;) andwrote code in C/pthreads, PVM and MPI and back when anyone could walkinto fields like bioinformatics, all that was needed was a pulse, someC and Perl and a desire to do ;-). Then I left for the private sectorand stumbled into "big data" some years later - I wrote a lot of codein Spark and Scala, worked in infrastructure to support it etc.
Then I went back (in 2017) to HPC. I was surprised to find that notmuch has changed - researchers and grad students still write code inMPI and C/C++ and maybe some Python or R for visualization orlocalized data analytics. I also noticed that it was not easy to"marry" things like big data with HPC clusters - tools likeSpark/Hadoop do not really have the same underlying infrastructureassumptions as do things like MPI/supercomputers. However, I find itwasteful for a university to run separate clusters to support a datascience/big data load vs traditional HPC.
I then stumbled upon languages like Julia - I like its approach, codeis data, visualization is easy, decent ML/DS tooling.
How does it fare on a traditional HCP cluster? Are people using it tosubstitute their MPI loads? On the opposite side, has it caught up toSpark in terms of DS/ML quality of offering? In other words, can it beused as a one fell swoop unifying substitute for both opposingapproaches?
I realize that many people have already committed to certaintech/paradigms but this is mostly educational debt (if MPI or Spark onthe other side is working for me, why go to something different?) -but is there anything substantial stopping new people with no debtstarting out in a different approach (offerings like Julia)?
I do not have too much experience with Julia (and hence may be barkingat the wrong tree) - in that case I am wondering what people are doingto "marry" the loads of traditional HPC with "big data" as practicedby the commercial/industry entities on a single underlying hardwareoffering. I know there are things like Twister2 but it is unclear tome (from cursory examination) what it actually offers in the contextof my questions above.
Any input, corrections, schooling me etc. are appreciated.

Thank you!

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


--
Prentice Bisbal
Lead Software Engineer
Research Computing
Princeton Plasma Physics Laboratory
http://www.pppl.gov

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] [External] Spark, Julia, OpenMPI etc. - all in one place

Reply via email to