Re: Help debugging YARN/Twill issue

Terence Yim Thu, 05 May 2016 23:44:35 -0700

Hi Keith,

That seems really strange. Is it always failed to acquire new containers after 
a fixed number of container restarts or is it kind of random? I’ve a cluster 
that has containers restarted over 200 times (due to the runnable is using 
close to its memory limit and get killed by NM occasionally) and still running 
fine.


Also, have you tried to use other schedule (e.g. Fair scheduler) to see if the 
you get the same result?

Terence


> On May 4, 2016, at 11:57 AM, Keith Turner <[email protected]> wrote:
> 
> On Wed, May 4, 2016 at 2:14 PM, Terence Yim <[email protected]> wrote:
> 
>> Hi Keith,
>> 
>> What is the Hadoop version you are using? Judging from the log, it could be
>> a bug in the Capacity scheduler[1].
>> 
> 
> I am using Hadoop 2.6.3.  So that bug should be fixed.
> 
> 
>> Also, have you look at the node manager log of the node "worker14:40196"?
>> 
> 
> No I had not, thats a good idea.  I grepped that log for the yarn app id
> 1462212200762_0008 and saw nothing pertinent.  I also looked around the
> time of the error message in the RM and saw nothing pertinent.
> 
> 
>> 
>> [1] https://issues.apache.org/jira/browse/YARN-2628
>> 
>> Terence
>> 
>> On Wed, May 4, 2016 at 8:44 AM, Keith Turner <[email protected]> wrote:
>> 
>>> I ran into an issue where Yarn does not seem to be starting container
>> again
>>> for an application after some containers died.  The details of the issue
>> I
>>> am running into are outlined in fluo#657 [1].
>>> 
>>> Twill seems to be trying to restart the containers, but it seems YARN is
>>> not doing it.   Looking at the YARN RM web page there are enough cores
>> and
>>> memory available to start the containers, so I am not sure why its not
>>> starting them.
>>> 
>>> Does anyone has any tips for debugging this issue or hve a second to look
>>> at the logs attached to fluo#657?
>>> 
>>> [1] : https://github.com/fluo-io/fluo/issues/657
>>> 
>>

Re: Help debugging YARN/Twill issue

Reply via email to