Re: [Beowulf] Beowulf Cluster VS Hadoop/Spark

2016-12-30 Thread John Hanks
I pictured Doug standing in front of a crowd of problems. He shouts "you do not have to follow my methods, you are all individuals". The crowd replies in unison "we are all individuals", then one problem stands and says "I'm not". (Apologies to any non-monty Python fans) There is a theoretical opt

Re: [Beowulf] Beowulf Cluster VS Hadoop/Spark

2016-12-30 Thread John Hanks
Until an industry has had at least a decade of countries and institutions spending millions and millions of dollars designing systems to compete for a spot on a voluntary list based on arbitrary synthetic benchmarks, how can it possibly be taken seriously? I do sort of recall the early days of had

Re: [Beowulf] Beowulf Cluster VS Hadoop/Spark

2016-12-30 Thread Douglas Eadline
> I suspect that you can take any hadoop/spark application and give it to a > good C/C++/OpenMp/MPI coder and in six months, a year, two years,..., you > will end up with a much faster and much more efficient application. > Meanwhile the original question the application was answering very likely

Re: [Beowulf] Beowulf Cluster VS Hadoop/Spark

2016-12-30 Thread Douglas Eadline
> As I thought about that I decided it's worth expanding on stereotypical > MPI > user vs. stereotypical Spark user. In general if I ask each about the I/O > pattern of their codes, I'll get: > > MPI user: "We open N files for read, M files for write, typically Y% of > this is random r/w with some

Re: [Beowulf] Beowulf Cluster VS Hadoop/Spark

2016-12-30 Thread Fabricio Cannini
Em 30-12-2016 05:47, John Hanks escreveu: This often gets presented as an either/or proposition and it's really not. We happily use SLURM to schedule the setup, run and teardown of spark clusters. At the end of the day it's all software, even the kernel and OS. The big secret of HPC is that in a

Re: [Beowulf] Beowulf Cluster VS Hadoop/Spark

2016-12-30 Thread John Hanks
I suspect that you can take any hadoop/spark application and give it to a good C/C++/OpenMp/MPI coder and in six months, a year, two years,..., you will end up with a much faster and much more efficient application. Meanwhile the original question the application was answering very likely won't mat

Re: [Beowulf] Beowulf Cluster VS Hadoop/Spark

2016-12-30 Thread Douglas Eadline
> Thanks John for your reply. Very interesting food for thought here. What > I do understand between hadoop and spark is that spark is intended, i > could be wrong here, as a replacement to hadoop as it performs better > and faster then hadoop. Please allow me to let some air out of this idea tha

[Beowulf] CentOS 7.x for cluster nodes

2016-12-30 Thread m . somers
Hi, we have been using CentOS 7 for one year now on a ~120 node cluster and on a more recent ~40 node cluster. Details can be found at http://theor.lic.leidenuniv.nl/facilities Experiences: CentOS7 does just fine, systemd is a slight annoyance due to muscle memory of the service scripts I was u

Re: [Beowulf] Beowulf Cluster VS Hadoop/Spark

2016-12-30 Thread John Hanks
As I thought about that I decided it's worth expanding on stereotypical MPI user vs. stereotypical Spark user. In general if I ask each about the I/O pattern of their codes, I'll get: MPI user: "We open N files for read, M files for write, typically Y% of this is random r/w with some streaming at

Re: [Beowulf] Beowulf Cluster VS Hadoop/Spark

2016-12-30 Thread Jonathan Aquilina
Thanks John for your reply. Very interesting food for thought here. What I do understand between hadoop and spark is that spark is intended, i could be wrong here, as a replacement to hadoop as it performs better and faster then hadoop. Is spark also java based? I never thought java to be so high