Restricting the number of slave nodes used for a given job (regardless of the # of map/reduce tasks involved)

Safdar Kureishy Mon, 10 Sep 2012 02:07:11 -0700

Hi,

I need to run some benchmarking tests for a given mapreduce job on a *subset
*of a 10-node Hadoop cluster. Not that it matters, but the current cluster
settings allow for ~20 map slots and 10 reduce slots per node.


Without loss of generalization, let's say I want a job with these
constraints below:
- to use only *5* out of the 10 nodes for running the mappers,
- to use only *5* out of the 10 nodes for running the reducers.

Is there any other way of achieving this through Hadoop property overrides
during job-submission time? I understand that the Fair Scheduler can
potentially be used to create pools of a proportionate # of mappers and
reducers, to achieve a similar outcome, but the problem is that I still
cannot tie such a pool to a fixed # of machines (right?). Essentially,
regardless of the # of map/reduce tasks involved, I only want a *fixed # of
machines* to handle the job.

Any tips on how I can go about achieving this?

Thanks,
Safdar

Restricting the number of slave nodes used for a given job (regardless of the # of map/reduce tasks involved)

Reply via email to