There is an idle timeout for map/reduce tasks. If a task makes no progress for 10 min (Default) the AM will kill it on 2.0 and the JT will kill it on 1.0. But I don't know of anything associated with a Job, other then in 0.23 is the AM does not heart beat back in for too long, I believe that the RM may kill it and retry, but I don't know for sure.
--Bobby Evans On 5/11/12 10:53 AM, "Harsh J" <[email protected]> wrote: Am not aware of a job-level timeout or idle monitor. On Fri, May 11, 2012 at 7:33 PM, Shi Yu <[email protected]> wrote: > Is there any risk to suppress a job too long in FS? I guess there are > some parameters to control the waiting time of a job (such as timeout > ,etc.), for example, if a job is kept idle for more than 24 hours is there > a configuration deciding kill/keep that job? > > Shi > > > On 5/11/2012 6:52 AM, Rita wrote: >> >> thanks. I think I will investigate capacity scheduler. >> >> >> On Fri, May 11, 2012 at 7:26 AM, Michael >> Segel<[email protected]>wrote: >> >>> Just a quick note... >>> >>> If your task is currently occupying a slot, the only way to release the >>> slot is to kill the specific task. >>> If you are using FS, you can move the task to another queue and/or you >>> can >>> lower the job's priority which will cause new tasks to spawn slower than >>> other jobs so you will eventually free up the cluster. >>> >>> There isn't a way to 'freeze' or stop a job mid state. >>> >>> Is the issue that the job has a large number of slots, or is it an issue >>> of the individual tasks taking a long time to complete? >>> >>> If its the latter, you will probably want to go to a capacity scheduler >>> over the fair scheduler. >>> >>> HTH >>> >>> -Mike >>> >>> On May 11, 2012, at 6:08 AM, Harsh J wrote: >>> >>>> I do not know about the per-host slot control (that is most likely not >>>> supported, or not yet anyway - and perhaps feels wrong to do), but the >>>> rest of the needs can be doable if you use schedulers and >>>> queues/pools. >>>> >>>> If you use FairScheduler (FS), ensure that this job always goes to a >>>> special pool and when you want to freeze the pool simply set the >>>> pool's maxMaps and maxReduces to 0. Likewise, control max simultaneous >>>> tasks as you wish, to constrict instead of freeze. When you make >>>> changes to the FairScheduler configs, you do not need to restart the >>>> JT, and you may simply wait a few seconds for FairScheduler to refresh >>>> its own configs. >>>> >>>> More on FS at >>> >>> http://hadoop.apache.org/common/docs/current/fair_scheduler.html >>>> >>>> If you use CapacityScheduler (CS), then I believe you can do this by >>>> again making sure the job goes to a specific queue, and when needed to >>>> freeze it, simply set the queue's maximum-capacity to 0 (percentage) >>>> or to constrict it, choose a lower, positive percentage value as you >>>> need. You can also refresh CS to pick up config changes by refreshing >>>> queues via mradmin. >>>> >>>> More on CS at >>> >>> http://hadoop.apache.org/common/docs/current/capacity_scheduler.html >>>> >>>> Either approach will not freeze/constrict the job immediately, but >>>> should certainly prevent it from progressing. Meaning, their existing >>>> running tasks during the time of changes made to scheduler config will >>>> continue to run till completion but further tasks scheduling from >>>> those jobs shall begin seeing effect of the changes made. >>>> >>>> P.s. A better solution would be to make your job not take as many >>>> days, somehow? :-) >>>> >>>> On Fri, May 11, 2012 at 4:13 PM, Rita<[email protected]> wrote: >>>>> >>>>> I have a rather large map reduce job which takes few days. I was >>> >>> wondering >>>>> >>>>> if its possible for me to freeze the job or make the job less >>> >>> intensive. Is >>>>> >>>>> it possible to reduce the number of slots per host and then I can >>> >>> increase >>>>> >>>>> them overnight? >>>>> >>>>> >>>>> tia >>>>> >>>>> -- >>>>> --- Get your facts first, then you can distort them as you please.-- >>>> >>>> >>>> >>>> -- >>>> Harsh J >>>> >>> >> > -- Harsh J
