Suppose you have Hadoop jobs that are communication-bound (due to lots
of data shuffling between maps and reduces), what is the most
practical network bandwidth to strive for in such a cluster? I think
it should be the sustained read bandwidth of the disks on the nodes
times the number of nodes, since any more bandwidth than this could
not be utilized. Agree or disagree? If you disagree, could you explain
what you think it should be. Thanks.

Reply via email to