Hi Ben, Readded the list
I think where im confused is that to me doesn’t that what Hadoop/Spark does distributes the data for computation then aggregates it back into a single data set? Correct me if I am wrong here. Also another thing I cant seem to understand is how for big data analytics a java based platfrom manages to get some great performance to crunch large data sets. Regards, Jonathan -----Original Message----- From: Benjamin Redling <benjamin.ra...@uni-jena.de> Sent: 24 November 2020 09:03 To: Jonathan Aquilina <jaquil...@eagleeyet.net> Subject: Re: [Beowulf] Clustering vs Hadoop/spark Hello Jonathan, On 24/11/2020 06.22, Jonathan Aquilina via Beowulf wrote: > I am just wondering what advantages does setting up of a cluster have > in relation to big data analytics vs using something like Hadoop/spark? can you distribute any application without programming against a framework? We distribute a lot of data parallel tasks with the source code unchanged via SLURM. Regards, Benjamin -- FSU Jena | JULIELab.de/Staff/Redling ☎ +49 3641 9 44323 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf