Re: [Beowulf] ***UNCHECKED*** Re: Spark, Julia, OpenMPI etc. - all in one place

2020-10-13 Thread Oddo Da
Doug, thank you for taking the time! Your Julia comments are in line with my impression of it, hence the initial question I posed in this thread. Thank you for all your insights. On Tue, Oct 13, 2020 at 5:03 PM Douglas Eadline wrote: > > > On Tue, Oct 13, 2020 at 3:54 PM Douglas Eadline > > wro

Re: [Beowulf] ***UNCHECKED*** Re: Spark, Julia, OpenMPI etc. - all in one place

2020-10-13 Thread Douglas Eadline
> On Tue, Oct 13, 2020 at 3:54 PM Douglas Eadline > wrote: > >> >> It really depends on what you need to do with Hadoop or Spark. >> IMO many organizations don't have enough data to justify >> standing up a 16-24 node cluster system with a PB of HDFS. >> > > Excellent. If I understand what you ar

[Beowulf] ***UNCHECKED*** Re: Re: Spark, Julia, OpenMPI etc. - all in one place

2020-10-13 Thread Oddo Da
On Tue, Oct 13, 2020 at 3:54 PM Douglas Eadline wrote: > > It really depends on what you need to do with Hadoop or Spark. > IMO many organizations don't have enough data to justify > standing up a 16-24 node cluster system with a PB of HDFS. > Excellent. If I understand what you are saying, ther

[Beowulf] ***UNCHECKED*** Re: Re: Spark, Julia, OpenMPI etc. - all in one place

2020-10-13 Thread Douglas Eadline
> On Tue, Oct 13, 2020 at 1:31 PM Douglas Eadline > wrote: > >> >> The reality is almost all Analytics projects require multiple >> tools. For instance, Spark is great, but if you do some >> data munging of CSV files and want to store your results >> at scale you can't write a single file to your

[Beowulf] ***UNCHECKED*** Re: Spark, Julia, OpenMPI etc. - all in one place

2020-10-13 Thread Oddo Da
On Tue, Oct 13, 2020 at 1:31 PM Douglas Eadline wrote: > > The reality is almost all Analytics projects require multiple > tools. For instance, Spark is great, but if you do some > data munging of CSV files and want to store your results > at scale you can't write a single file to your local file

Re: [Beowulf] Spark, Julia, OpenMPI etc. - all in one place

2020-10-13 Thread Douglas Eadline
> On Tue, Oct 13, 2020 at 9:55 AM Douglas Eadline > wrote: > >> >> Spark is a completely separate code base that has its own Map Reduce >> engine. It can work stand-alone, with the YARN scheduler, or with >> other schedulers. It can also take advantage of HDFS. >> > > Doug, this is correct. I thi

Re: [Beowulf] Spark, Julia, OpenMPI etc. - all in one place

2020-10-13 Thread Oddo Da
On Tue, Oct 13, 2020 at 9:55 AM Douglas Eadline wrote: > > Spark is a completely separate code base that has its own Map Reduce > engine. It can work stand-alone, with the YARN scheduler, or with > other schedulers. It can also take advantage of HDFS. > Doug, this is correct. I think for all pra

Re: [Beowulf] ***UNCHECKED*** Re: Spark, Julia, OpenMPI etc. - all in one place

2020-10-13 Thread Douglas Eadline
> Hi Doug, > > How have they managed to squeeze so much performance out of java for such > big data sets? Nothing to do with Java, originally had to do with "moving computation to data" (Hadoop YARN can provide data locality for Map Reduce, i.e. large files are sliced on HDFS data nodes, the Map

Re: [Beowulf] ***UNCHECKED*** Re: Spark, Julia, OpenMPI etc. - all in one place

2020-10-13 Thread Jonathan Aquilina via Beowulf
Hi Doug, How have they managed to squeeze so much performance out of java for such big data sets? Regards, Jonathan -Original Message- From: Beowulf On Behalf Of Douglas Eadline Sent: 13 October 2020 15:55 To: Oddo Da Cc: beowulf@beowulf.org Subject: [Beowulf] ***UNCHECKED*** Re: Spar

[Beowulf] ***UNCHECKED*** Re: Spark, Julia, OpenMPI etc. - all in one place

2020-10-13 Thread Douglas Eadline
I have noticed a lot of Hadoop/Spark references in the replies. The word "Hadoop" is probably the most misunderstood word in computing today and may people have a somewhat vague idea what it actually is. Hadoop V1 was a monolithic Map Reduce framework written in Java. (BTW Map Reduce is a SIMD al

Re: [Beowulf] [External] Spark, Julia, OpenMPI etc. - all in one place

2020-10-13 Thread Michael Di Domenico
On Tue, Oct 13, 2020 at 8:52 AM Guy Coates wrote: > > Having just spent some time looking at parallelising some ML/AI workloads, it > was enlightening to see that as you scratch beneath the various frameworks > like pytorch or horovod, you find...MPI. And RDMA. And workloads that can > quickly

Re: [Beowulf] [External] Spark, Julia, OpenMPI etc. - all in one place

2020-10-13 Thread Michael Di Domenico
On Tue, Oct 13, 2020 at 8:39 AM Oddo Da wrote: > > Michael, thank you for the insight. I think Hadoop in general is mostly > dying, Spark is really the derivative that took off. Basically, what you are > saying is that there is no demand on your infra for this kind of work. Do you > have any in

Re: [Beowulf] [External] Spark, Julia, OpenMPI etc. - all in one place

2020-10-13 Thread Guy Coates
> . mpi is becoming a bit > like cobol. everyone claims it's dead, but yet it's still around > > Having just spent some time looking at parallelising some ML/AI workloads, it was enlightening to see that as you scratch beneath the various frameworks like pytorch or horovod, you find...MPI. And RD

Re: [Beowulf] [External] Spark, Julia, OpenMPI etc. - all in one place

2020-10-13 Thread Jonathan Aquilina via Beowulf
Looking at my question I posted earlier how is it that java is so high performing when it comes to large data sets? Regards, Jonathan From: Beowulf On Behalf Of Oddo Da Sent: 13 October 2020 14:38 To: Michael Di Domenico Cc: Beowulf Mailing List Subject: Re: [Beowulf] [External] Spark, Julia,

Re: [Beowulf] [External] Spark, Julia, OpenMPI etc. - all in one place

2020-10-13 Thread Oddo Da
On Tue, Oct 13, 2020 at 8:33 AM Michael Di Domenico wrote: > i can't speak from a general industry sense, but i've had everything > run through my center over the past 11 years. Hadoop seemed like > something that was going to take off. it didn't with my group of > users. we aren't counting cl

Re: [Beowulf] [External] Spark, Julia, OpenMPI etc. - all in one place

2020-10-13 Thread Jonathan Aquilina via Beowulf
Hi Guys, How does Hadoop manage to crunch large data sets what makes it the go to platform and its siblings for big data? Regards, Jonathan -Original Message- From: Beowulf On Behalf Of Michael Di Domenico Sent: 13 October 2020 14:32 Cc: Beowulf Mailing List Subject: Re: [Beowulf] [Ex

Re: [Beowulf] [External] Spark, Julia, OpenMPI etc. - all in one place

2020-10-13 Thread Michael Di Domenico
i can't speak from a general industry sense, but i've had everything run through my center over the past 11 years. Hadoop seemed like something that was going to take off. it didn't with my group of users. we aren't counting clicks nor parsing text from huge files, so its utility to us faded. m

Re: [Beowulf] [External] Spark, Julia, OpenMPI etc. - all in one place

2020-10-13 Thread Benson Muite
On 10/13/20 3:12 PM, Oddo Da wrote: Jim, Peter: by things have not changed in the tooling I meant that it is the same approach/paradigm as it was when I was in HPC back in the late 1990s/early 2000s. Even if you look at books about OpenMPI, you can go on their mailing list and ask what books t

Re: [Beowulf] [External] Spark, Julia, OpenMPI etc. - all in one place

2020-10-13 Thread Oddo Da
Jim, Peter: by things have not changed in the tooling I meant that it is the same approach/paradigm as it was when I was in HPC back in the late 1990s/early 2000s. Even if you look at books about OpenMPI, you can go on their mailing list and ask what books to read and you will be pointed to the sam

Re: [Beowulf] [External] Spark, Julia, OpenMPI etc. - all in one place

2020-10-13 Thread Jim Cownie
>> It just seems to me that things have not really changed in the tooling in >> the HPC space since 20+ years ago. It's also worth pointing out that the OpenMP of the year 2000 (OpenMP 2.0) is not the OpenMP of 2020 (OpenMP 5.1), (just as C++20 is not C++98), and, similarly MPI has also advanc

Re: [Beowulf] [External] Spark, Julia, OpenMPI etc. - all in one place

2020-10-13 Thread Peter Kjellström
On Mon, 12 Oct 2020 22:04:30 -0400 Oddo Da wrote: > Johann-Tobias, > > Thank you for the reply. > > I don't know enough detail about Julia to even be confused (I am > learning it now) :-) > > It just seems to me that things have not really changed in the > tooling in the HPC space since 20+ ye