In case of Distributed Shell Job - ApplicationMaster runs in normal linux
container and the subsequent shell command runs inside Docker
container. The job fails even before launching AM, that is before starting
Docker Container. I think the Distributed Shell job will fail even
without Docker Settings.

As per the error code 20 , it is mostly related to accessing of NM local
directory.

https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_sg_yarn_container_exec_errors.html

20

INITIALIZE_USER_FAILED

Couldn't get, stat, or secure the per-user NodeManager directory.

Can we try below steps on (all) NodeManager machine.

Remove all contents under /data/yarn and make sure the /data and /data/yarn
directory permission is 755 with owner root:root and local directory
is owned by yarn:hadoop.

[root@tparimi-tarunhdp26-4 ~]# ls -lrt /
drwxr-xr-x.   5 root root    44 Oct 24 11:47 data

[root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/
drwxr-xr-x. 4 root      root   28 Oct 24 14:30 yarn

[root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/yarn/
total 4
drwxr-xr-x.  5 yarn hadoop   54 Feb 14 17:32 local
drwxrwxr-x. 10 yarn hadoop 4096 Feb 14 17:32 log

And also check if Distributed Shell jobs runs fine without Docker Settings.





On Thu, Feb 14, 2019 at 10:15 PM Vinay Kashyap <[email protected]> wrote:

> Hi Prabhu,
>
> Thanks for your reply.
> I tried the configurations as per your suggestion. But I get the
> same error.
> Is this related to container localization by any chance?.
> Also, is there any log or out information which says that the docker
> container runtime has been picked up.?
>
>
>
> On Thu, Feb 14, 2019 at 9:38 PM Prabhu Josephraj <[email protected]>
> wrote:
>
>> Hi Vinay,
>>
>>     Can you try specifying below configs under Docker section in
>> container-executor.cfg which will allow Docker Containers to use the NM
>> Local Dirs.
>>
>>
>> docker.allowed.ro-mounts=/data/yarn/local,,/usr/jdk64/jdk1.8.0_112/bin
>>       docker.allowed.rw-mounts=/data/yarn/local,/data/yarn/log
>>
>> Thanks,
>> Prabhu Joseph
>>
>> On Thu, Feb 14, 2019 at 9:28 PM Vinay Kashyap <[email protected]>
>> wrote:
>>
>>>
>>> I am using Hadoop 3.2.0 and trying to run a simple application in a
>>> docker container and I have made the required configuration changes both in
>>> *yarn-site.xml* and *container-executor.cfg* to choose
>>> LinuxContainerExecutor and docker runtime.
>>>
>>> I use the example of distributed shell in one of the hortonworks blog.
>>> https://hortonworks.com/blog/trying-containerized-applications-apache-hadoop-yarn-3-1/
>>>
>>> The problem I face here is when the application is submitted to YARN it
>>> fails with a reason related to directory creation issue with the below error
>>>
>>> 2019-02-14 20:51:16,450 INFO distributedshell.Client: Got application
>>> report from ASM for, appId=2, clientToAMToken=null,
>>> appDiagnostics=Application application_1550156488785_0002 failed 2 times
>>> due to AM Container for appattempt_1550156488785_0002_000002 exited with
>>> exitCode: -1000 Failing this attempt.Diagnostics: [2019-02-14
>>> 20:51:16.282]Application application_1550156488785_0002 initialization
>>> failed (exitCode=20) with output: main : command provided 0 main : user is
>>> myuser main : requested yarn user is myuser Failed to create directory
>>> /data/yarn/local/nmPrivate/container_1550156488785_0002_02_000001.tokens/usercache/myuser
>>> - Not a directory
>>>
>>> I have configured *yarn.nodemanager.local-dirs* in yarn-site.xml and I
>>> can see the same reflected in YARN web ui *localhost:8088/conf*
>>>
>>> <property>
>>>     <name>yarn.nodemanager.local-dirs</name>
>>>     <value>/data/yarn/local</value>
>>>     <final>false</final>
>>>     <source>yarn-site.xml</source>
>>> </property>
>>>
>>> I do not understand why is it trying to create usercache dir inside the
>>> nmPrivate directory.
>>>
>>> Note : I have verified the permissions for myuser to the directories and
>>> also have tried clearing the directories manually as suggested in a related
>>> post. But no fruit. I do not see any additional information about container
>>> launch failure in any other logs.
>>>
>>> How do I debug why the usercache dir is not resolved properly??
>>>
>>> Really appreciate any help on this.
>>>
>>> Thanks
>>>
>>> Vinay Kashyap
>>>
>>
>
> --
> *Thanks and regards*
> *Vinay Kashyap*
>

Reply via email to