In case of Distributed Shell Job - ApplicationMaster runs in normal linux container and the subsequent shell command runs inside Docker container. The job fails even before launching AM, that is before starting Docker Container. I think the Distributed Shell job will fail even without Docker Settings.
As per the error code 20 , it is mostly related to accessing of NM local directory. https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_sg_yarn_container_exec_errors.html 20 INITIALIZE_USER_FAILED Couldn't get, stat, or secure the per-user NodeManager directory. Can we try below steps on (all) NodeManager machine. Remove all contents under /data/yarn and make sure the /data and /data/yarn directory permission is 755 with owner root:root and local directory is owned by yarn:hadoop. [root@tparimi-tarunhdp26-4 ~]# ls -lrt / drwxr-xr-x. 5 root root 44 Oct 24 11:47 data [root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/ drwxr-xr-x. 4 root root 28 Oct 24 14:30 yarn [root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/yarn/ total 4 drwxr-xr-x. 5 yarn hadoop 54 Feb 14 17:32 local drwxrwxr-x. 10 yarn hadoop 4096 Feb 14 17:32 log And also check if Distributed Shell jobs runs fine without Docker Settings. On Thu, Feb 14, 2019 at 10:15 PM Vinay Kashyap <[email protected]> wrote: > Hi Prabhu, > > Thanks for your reply. > I tried the configurations as per your suggestion. But I get the > same error. > Is this related to container localization by any chance?. > Also, is there any log or out information which says that the docker > container runtime has been picked up.? > > > > On Thu, Feb 14, 2019 at 9:38 PM Prabhu Josephraj <[email protected]> > wrote: > >> Hi Vinay, >> >> Can you try specifying below configs under Docker section in >> container-executor.cfg which will allow Docker Containers to use the NM >> Local Dirs. >> >> >> docker.allowed.ro-mounts=/data/yarn/local,,/usr/jdk64/jdk1.8.0_112/bin >> docker.allowed.rw-mounts=/data/yarn/local,/data/yarn/log >> >> Thanks, >> Prabhu Joseph >> >> On Thu, Feb 14, 2019 at 9:28 PM Vinay Kashyap <[email protected]> >> wrote: >> >>> >>> I am using Hadoop 3.2.0 and trying to run a simple application in a >>> docker container and I have made the required configuration changes both in >>> *yarn-site.xml* and *container-executor.cfg* to choose >>> LinuxContainerExecutor and docker runtime. >>> >>> I use the example of distributed shell in one of the hortonworks blog. >>> https://hortonworks.com/blog/trying-containerized-applications-apache-hadoop-yarn-3-1/ >>> >>> The problem I face here is when the application is submitted to YARN it >>> fails with a reason related to directory creation issue with the below error >>> >>> 2019-02-14 20:51:16,450 INFO distributedshell.Client: Got application >>> report from ASM for, appId=2, clientToAMToken=null, >>> appDiagnostics=Application application_1550156488785_0002 failed 2 times >>> due to AM Container for appattempt_1550156488785_0002_000002 exited with >>> exitCode: -1000 Failing this attempt.Diagnostics: [2019-02-14 >>> 20:51:16.282]Application application_1550156488785_0002 initialization >>> failed (exitCode=20) with output: main : command provided 0 main : user is >>> myuser main : requested yarn user is myuser Failed to create directory >>> /data/yarn/local/nmPrivate/container_1550156488785_0002_02_000001.tokens/usercache/myuser >>> - Not a directory >>> >>> I have configured *yarn.nodemanager.local-dirs* in yarn-site.xml and I >>> can see the same reflected in YARN web ui *localhost:8088/conf* >>> >>> <property> >>> <name>yarn.nodemanager.local-dirs</name> >>> <value>/data/yarn/local</value> >>> <final>false</final> >>> <source>yarn-site.xml</source> >>> </property> >>> >>> I do not understand why is it trying to create usercache dir inside the >>> nmPrivate directory. >>> >>> Note : I have verified the permissions for myuser to the directories and >>> also have tried clearing the directories manually as suggested in a related >>> post. But no fruit. I do not see any additional information about container >>> launch failure in any other logs. >>> >>> How do I debug why the usercache dir is not resolved properly?? >>> >>> Really appreciate any help on this. >>> >>> Thanks >>> >>> Vinay Kashyap >>> >> > > -- > *Thanks and regards* > *Vinay Kashyap* >
