Suppose you have Hadoop jobs that are communication-bound (due to lots of data shuffling between maps and reduces), what is the most practical network bandwidth to strive for in such a cluster? I think it should be the sustained read bandwidth of the disks on the nodes times the number of nodes, since any more bandwidth than this could not be utilized. Agree or disagree? If you disagree, could you explain what you think it should be. Thanks.
- Most Practical Bandwidth for a Hadoo... Jeff Kubina
- RE: Most Practical Bandwidth fo... Saqib Jang -- Margalla Communications
