Doug, thank you for taking the time! Your Julia comments are in line with
my impression of it, hence the initial question I posed in this thread.
Thank you for all your insights.
On Tue, Oct 13, 2020 at 5:03 PM Douglas Eadline
wrote:
>
> > On Tue, Oct 13, 2020 at 3:54 PM Douglas Eadline
> > wro
> On Tue, Oct 13, 2020 at 3:54 PM Douglas Eadline
> wrote:
>
>>
>> It really depends on what you need to do with Hadoop or Spark.
>> IMO many organizations don't have enough data to justify
>> standing up a 16-24 node cluster system with a PB of HDFS.
>>
>
> Excellent. If I understand what you ar
On Tue, Oct 13, 2020 at 3:54 PM Douglas Eadline
wrote:
>
> It really depends on what you need to do with Hadoop or Spark.
> IMO many organizations don't have enough data to justify
> standing up a 16-24 node cluster system with a PB of HDFS.
>
Excellent. If I understand what you are saying, ther
> On Tue, Oct 13, 2020 at 1:31 PM Douglas Eadline
> wrote:
>
>>
>> The reality is almost all Analytics projects require multiple
>> tools. For instance, Spark is great, but if you do some
>> data munging of CSV files and want to store your results
>> at scale you can't write a single file to your
On Tue, Oct 13, 2020 at 1:31 PM Douglas Eadline
wrote:
>
> The reality is almost all Analytics projects require multiple
> tools. For instance, Spark is great, but if you do some
> data munging of CSV files and want to store your results
> at scale you can't write a single file to your local file
> On Tue, Oct 13, 2020 at 9:55 AM Douglas Eadline
> wrote:
>
>>
>> Spark is a completely separate code base that has its own Map Reduce
>> engine. It can work stand-alone, with the YARN scheduler, or with
>> other schedulers. It can also take advantage of HDFS.
>>
>
> Doug, this is correct. I thi
On Tue, Oct 13, 2020 at 9:55 AM Douglas Eadline
wrote:
>
> Spark is a completely separate code base that has its own Map Reduce
> engine. It can work stand-alone, with the YARN scheduler, or with
> other schedulers. It can also take advantage of HDFS.
>
Doug, this is correct. I think for all pra
> Hi Doug,
>
> How have they managed to squeeze so much performance out of java for such
> big data sets?
Nothing to do with Java, originally had to do with "moving computation
to data" (Hadoop YARN can provide data locality for Map Reduce,
i.e. large files are sliced on HDFS data nodes, the Map
Hi Doug,
How have they managed to squeeze so much performance out of java for such big
data sets?
Regards,
Jonathan
-Original Message-
From: Beowulf On Behalf Of Douglas Eadline
Sent: 13 October 2020 15:55
To: Oddo Da
Cc: beowulf@beowulf.org
Subject: [Beowulf] ***UNCHECKED*** Re: Spar
I have noticed a lot of Hadoop/Spark references in the replies.
The word "Hadoop" is probably the most misunderstood
word in computing today and may people have a somewhat
vague idea what it actually is.
Hadoop V1 was a monolithic Map Reduce framework written in
Java. (BTW Map Reduce is a SIMD al
On Tue, Oct 13, 2020 at 8:52 AM Guy Coates wrote:
>
> Having just spent some time looking at parallelising some ML/AI workloads, it
> was enlightening to see that as you scratch beneath the various frameworks
> like pytorch or horovod, you find...MPI. And RDMA. And workloads that can
> quickly
On Tue, Oct 13, 2020 at 8:39 AM Oddo Da wrote:
>
> Michael, thank you for the insight. I think Hadoop in general is mostly
> dying, Spark is really the derivative that took off. Basically, what you are
> saying is that there is no demand on your infra for this kind of work. Do you
> have any in
> . mpi is becoming a bit
> like cobol. everyone claims it's dead, but yet it's still around
>
>
Having just spent some time looking at parallelising some ML/AI workloads,
it was enlightening to see that as you scratch beneath the various
frameworks like pytorch or horovod, you find...MPI. And RD
Looking at my question I posted earlier how is it that java is so high
performing when it comes to large data sets?
Regards,
Jonathan
From: Beowulf On Behalf Of Oddo Da
Sent: 13 October 2020 14:38
To: Michael Di Domenico
Cc: Beowulf Mailing List
Subject: Re: [Beowulf] [External] Spark, Julia,
On Tue, Oct 13, 2020 at 8:33 AM Michael Di Domenico
wrote:
> i can't speak from a general industry sense, but i've had everything
> run through my center over the past 11 years. Hadoop seemed like
> something that was going to take off. it didn't with my group of
> users. we aren't counting cl
Hi Guys,
How does Hadoop manage to crunch large data sets what makes it the go to
platform and its siblings for big data?
Regards,
Jonathan
-Original Message-
From: Beowulf On Behalf Of Michael Di Domenico
Sent: 13 October 2020 14:32
Cc: Beowulf Mailing List
Subject: Re: [Beowulf] [Ex
i can't speak from a general industry sense, but i've had everything
run through my center over the past 11 years. Hadoop seemed like
something that was going to take off. it didn't with my group of
users. we aren't counting clicks nor parsing text from huge files, so
its utility to us faded. m
On 10/13/20 3:12 PM, Oddo Da wrote:
Jim, Peter: by things have not changed in the tooling I meant that it is
the same approach/paradigm as it was when I was in HPC back in the late
1990s/early 2000s. Even if you look at books about OpenMPI, you can go
on their mailing list and ask what books t
Jim, Peter: by things have not changed in the tooling I meant that it is
the same approach/paradigm as it was when I was in HPC back in the late
1990s/early 2000s. Even if you look at books about OpenMPI, you can go on
their mailing list and ask what books to read and you will be pointed to
the sam
>> It just seems to me that things have not really changed in the tooling in
>> the HPC space since 20+ years ago.
It's also worth pointing out that the OpenMP of the year 2000 (OpenMP 2.0) is
not the OpenMP of 2020 (OpenMP 5.1), (just as C++20 is not C++98), and,
similarly MPI has also advanc
On Mon, 12 Oct 2020 22:04:30 -0400
Oddo Da wrote:
> Johann-Tobias,
>
> Thank you for the reply.
>
> I don't know enough detail about Julia to even be confused (I am
> learning it now) :-)
>
> It just seems to me that things have not really changed in the
> tooling in the HPC space since 20+ ye
21 matches
Mail list logo