I ran into an issue where Yarn does not seem to be starting container again for an application after some containers died. The details of the issue I am running into are outlined in fluo#657 [1].
Twill seems to be trying to restart the containers, but it seems YARN is not doing it. Looking at the YARN RM web page there are enough cores and memory available to start the containers, so I am not sure why its not starting them. Does anyone has any tips for debugging this issue or hve a second to look at the logs attached to fluo#657? [1] : https://github.com/fluo-io/fluo/issues/657
