Most Practical Bandwidth for a Hadoop Cluster?

Jeff Kubina Thu, 15 Mar 2012 11:31:14 -0700

Suppose you have Hadoop jobs that are communication-bound (due to lots
of data shuffling between maps and reduces), what is the most
practical network bandwidth to strive for in such a cluster? I think
it should be the sustained read bandwidth of the disks on the nodes
times the number of nodes, since any more bandwidth than this could
not be utilized. Agree or disagree? If you disagree, could you explain
what you think it should be. Thanks.

Most Practical Bandwidth for a Hadoop Cluster?

Reply via email to