Re: [Beowulf] Spark, Julia, OpenMPI etc. - all in one place

Douglas Eadline Tue, 13 Oct 2020 10:32:21 -0700

> On Tue, Oct 13, 2020 at 9:55 AM Douglas Eadline <deadl...@eadline.org>
> wrote:
>
>>
>> Spark is a completely separate code base that has its own Map Reduce
>> engine. It can work stand-alone, with the YARN scheduler, or with
>> other schedulers. It can also take advantage of HDFS.
>>
>
> Doug, this is correct. I think for all practical purposes Hadoop and Spark
> get lumped into the same bag because the underlying ideas are coming from
> the same place. A lot of people saw Spark (esp. at the beginning) as a
> much
> faster, in-memory Hadoop.


And then this "all or none, either/or" notion develops.
That is, Spark is better than Hadoop, so Hadoop is dead.

The reality is almost all Analytics projects require multiple
tools. For instance, Spark is great, but if you do some
data munging of CSV files and want to store your results
at scale you can't write a single file to your local file
system. Often times you write it as a Hive table to HDFS
(e.g. in Parquet format) so it is available for Hive SQL
queries or for other tools to use.

--
Doug


> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>


-- 
Doug

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Spark, Julia, OpenMPI etc. - all in one place

Reply via email to