Deepak, You may read the SleepJob example implementation which does exactly this.
It can be found inside a source tar-ball's $HADOOP_HOME/src/examples/org/apache/hadoop/examples/SleepJob.java Or available online at http://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.1/src/examples/org/apache/hadoop/examples/SleepJob.java To run it, follow the instructions you get when you run: $ hadoop jar $HADOOP_HOME/hadoop-examples.jar sleep On Thu, Mar 15, 2012 at 10:27 PM, Deepak Nettem <[email protected]> wrote: > Hi, > > I have a use case - I have files lying on the local disk of every node on > my cluster. I want to write a Mapper only MapReduce job that reads the file > off the local disk on every machine, applies some transformation and wrotes > to HDFS. > > Specifically, > > 1. The Job shouldn't have any input/output paths, and null key value pairs. > 2. Mapper Only > 3. I want to be able to control the number of Mappers, depending on the > size of my cluster. > > What's the best way to do this? I would appreciate any example code. > > Deepak -- Harsh J
