Hi, I need to run some benchmarking tests for a given mapreduce job on a *subset *of a 10-node Hadoop cluster. Not that it matters, but the current cluster settings allow for ~20 map slots and 10 reduce slots per node.
Without loss of generalization, let's say I want a job with these constraints below: - to use only *5* out of the 10 nodes for running the mappers, - to use only *5* out of the 10 nodes for running the reducers. Is there any other way of achieving this through Hadoop property overrides during job-submission time? I understand that the Fair Scheduler can potentially be used to create pools of a proportionate # of mappers and reducers, to achieve a similar outcome, but the problem is that I still cannot tie such a pool to a fixed # of machines (right?). Essentially, regardless of the # of map/reduce tasks involved, I only want a *fixed # of machines* to handle the job. Any tips on how I can go about achieving this? Thanks, Safdar
