Can you test with adding local into docker.trusted.registries in container-executor.cfg.
Fyi https://community.cloudera.com/t5/Support-Questions/Not-able-to-run-docker-container-on-yarn-even-after/m-p/224259 On Fri, Aug 30, 2019 at 2:07 PM Yen-Onn Hiu <[email protected]> wrote: > hi all, > > I have a bash script testing the docker container executor, try to > configure the distributedshell such like below. But keep having error as > like below. > > Any helps please... Thanks! > > > #!/bin/bash > export HADOOP_HOME="/usr/hdp/3.1.0.0-78/hadoop" > export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native" > export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native" > export HADOOP_COMMON_LIB_NATIVE_DIR="$HADOOP_HOME/lib/native" > export JAVA_LIBRARY_PATH="$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH" > export > DSHELL_JAR="/usr/hdp/3.1.0.0-78/hadoop-yarn/hadoop-yarn-applications-distributedshell-3.2.0.jar" > #export DOCKER_IMAGE="local/centos" > export DOCKER_IMAGE="local/openjdk:8.1" > export DSHELL_CMD="ls" > export NUM_OF_CONTAINERS=1 > > yarn --loglevel DEBUG jar $DSHELL_JAR \ > -shell_command $DSHELL_CMD \ > -jar $DSHELL_JAR \ > -shell_env YARN_CONTAINER_RUNTIME_TYPE="$RUNTIME" \ > -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE="$DOCKER_IMAGE" \ > -num_containers $NUM_OF_CONTAINERS > > > 19/08/30 15:22:12 INFO distributedshell.ApplicationMaster: placementSpecs null > 19/08/30 15:22:12 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[<memory:10, > vCores:1>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution > Type: GUARANTEED, Enforce Execution Type: false}]Resource Profile[] > 19/08/30 15:22:14 INFO distributedshell.ApplicationMaster: Got response from > RM for container ask, allocatedCnt=1 > 19/08/30 15:22:14 INFO distributedshell.ApplicationMaster: Launching shell > command on a new container., > containerId=container_e101_1567140885858_0043_01_000002, yarnShellId=1, > containerNode=hk-hdpoc-2001.agprod1.agoda.local:45454, > containerNodeURI=hk-hdpoc-2001.agprod1.agoda.local:8042, > containerResourceMemory1024, containerResourceVirtualCores1 > 19/08/30 15:22:14 INFO distributedshell.ApplicationMaster: Setting up > container launch container for > containerid=container_e101_1567140885858_0043_01_000002 with shellid=1 > 19/08/30 15:22:14 INFO impl.NMClientAsyncImpl: Processing Event EventType: > START_CONTAINER for Container container_e101_1567140885858_0043_01_000002 > 19/08/30 15:22:14 INFO impl.NMClientAsyncImpl: Processing Event EventType: > QUERY_CONTAINER for Container container_e101_1567140885858_0043_01_000002 > 19/08/30 15:22:15 INFO distributedshell.ApplicationMaster: Got response from > RM for container ask, completedCnt=1 > 19/08/30 15:22:15 ERROR distributedshell.ApplicationMaster: > appattempt_1567140885858_0043_000001 got container status for > containerID=container_e101_1567140885858_0043_01_000002, state=COMPLETE, > exitStatus=127, diagnostics=[2019-08-30 15:22:15.671]Exception from > container-launch. > Container id: container_e101_1567140885858_0043_01_000002 > Exit code: 127 > Exception message: Launch container failed > Shell output: main : command provided 4 > main : run as user is ambari-qa > main : requested yarn user is ambari-qa > 802b0a68c8332e819912e51eafc9527f382f48dbc91365bf5beb6ed54e14389c > Creating script paths... > Creating local dirs... > Getting exit code file... > Changing effective user to root... > Inspecting docker container... > Docker inspect command: /usr/bin/docker inspect --format {{.State.Pid}} > container_e101_1567140885858_0043_01_000002 > pid from docker inspect: 0 > Obtaining the exit code... > Docker inspect command: /usr/bin/docker inspect --format {{.State.ExitCode}} > container_e101_1567140885858_0043_01_000002 > Exit code from docker inspect: 127 > Wrote the exit code 127 to > /hadoop/yarn/local/nmPrivate/application_1567140885858_0043/container_e101_1567140885858_0043_01_000002/container_e101_1567140885858_0043_01_000002.pid.exitcode > > > [2019-08-30 15:22:15.672]Container exited with a non-zero exit code 127. Last > 4096 bytes of stderr.txt : > > > [2019-08-30 15:22:15.673]Container exited with a non-zero exit code 127. Last > 4096 bytes of stderr.txt : > > > > 19/08/30 15:22:16 INFO distributedshell.ApplicationMaster: Application > completed. Stopping running containers > 19/08/30 15:22:16 INFO distributedshell.ApplicationMaster: Application > completed. Signalling finished to RM > 19/08/30 15:22:16 INFO impl.AMRMClientImpl: Waiting for application to be > successfully unregistered. > 19/08/30 15:22:16 ERROR distributedshell.ApplicationMaster: Application > Master failed. exiting > > > -- > Hiu Yen Onn > > >
