> On Tue, Oct 13, 2020 at 9:55 AM Douglas Eadline <deadl...@eadline.org> > wrote: > >> >> Spark is a completely separate code base that has its own Map Reduce >> engine. It can work stand-alone, with the YARN scheduler, or with >> other schedulers. It can also take advantage of HDFS. >> > > Doug, this is correct. I think for all practical purposes Hadoop and Spark > get lumped into the same bag because the underlying ideas are coming from > the same place. A lot of people saw Spark (esp. at the beginning) as a > much > faster, in-memory Hadoop.
And then this "all or none, either/or" notion develops. That is, Spark is better than Hadoop, so Hadoop is dead. The reality is almost all Analytics projects require multiple tools. For instance, Spark is great, but if you do some data munging of CSV files and want to store your results at scale you can't write a single file to your local file system. Often times you write it as a Hive table to HDFS (e.g. in Parquet format) so it is available for Hive SQL queries or for other tools to use. -- Doug > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > -- Doug _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf