The CapacityScheduler (hadoop-0.20.203 onwards) allows you to stop a queue and start it again.
That will give you the behavior you described. Arun On Dec 12, 2011, at 5:50 AM, Dino Kečo wrote: > Hi Hadoop users, > > In my company we have been using Hadoop for 2 years and we have need to pause > and resume map reduce jobs. I was searching on Hadoop JIRA and there are > couple of tickets which are not resolved. So we have implemented our > solution. I would like to share this approach with you and to hear your > opinion about it. > > We have created one special pool in fair scheduler called PAUSE (maxMapTasks > = 0, maxReduceTasks = 0). Our logic for pausing job is to move it into this > pool and kill all running tasks. When we want to resume job we move this job > into some other pool. Currently we can do maintenance of cloud except Job > Tracker while jobs are paused. Also we have some external services which we > use and we are doing their maintenance while jobs are paused. > > We know that records which are processed by running tasks will be > reprocessed. In some cases we use same HBase table as input and output and we > save job id on record. When record is re-processes we check this job id and > skip record if it is processed by same job. > > Our custom implementation of fair scheduler have this logic implemented and > it is deployed to our cluster. > > Please share your comments and concerns about this approach > > Regards, > dino
