Can you test with adding local into docker.trusted.registries in
container-executor.cfg.

Fyi
https://community.cloudera.com/t5/Support-Questions/Not-able-to-run-docker-container-on-yarn-even-after/m-p/224259

On Fri, Aug 30, 2019 at 2:07 PM Yen-Onn Hiu <[email protected]> wrote:

> hi all,
>
> I have a bash script testing the docker container executor, try to
> configure the distributedshell such like below. But keep having error as
> like below.
>
> Any helps please... Thanks!
>
>
> #!/bin/bash
> export HADOOP_HOME="/usr/hdp/3.1.0.0-78/hadoop"
> export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native"
> export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
> export HADOOP_COMMON_LIB_NATIVE_DIR="$HADOOP_HOME/lib/native"
> export JAVA_LIBRARY_PATH="$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH"
> export
> DSHELL_JAR="/usr/hdp/3.1.0.0-78/hadoop-yarn/hadoop-yarn-applications-distributedshell-3.2.0.jar"
> #export DOCKER_IMAGE="local/centos"
> export DOCKER_IMAGE="local/openjdk:8.1"
> export DSHELL_CMD="ls"
> export NUM_OF_CONTAINERS=1
>
> yarn --loglevel DEBUG jar $DSHELL_JAR \
> -shell_command $DSHELL_CMD \
> -jar $DSHELL_JAR \
> -shell_env YARN_CONTAINER_RUNTIME_TYPE="$RUNTIME" \
> -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE="$DOCKER_IMAGE" \
> -num_containers $NUM_OF_CONTAINERS
>
>
> 19/08/30 15:22:12 INFO distributedshell.ApplicationMaster: placementSpecs null
> 19/08/30 15:22:12 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[<memory:10, 
> vCores:1>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution 
> Type: GUARANTEED, Enforce Execution Type: false}]Resource Profile[]
> 19/08/30 15:22:14 INFO distributedshell.ApplicationMaster: Got response from 
> RM for container ask, allocatedCnt=1
> 19/08/30 15:22:14 INFO distributedshell.ApplicationMaster: Launching shell 
> command on a new container., 
> containerId=container_e101_1567140885858_0043_01_000002, yarnShellId=1, 
> containerNode=hk-hdpoc-2001.agprod1.agoda.local:45454, 
> containerNodeURI=hk-hdpoc-2001.agprod1.agoda.local:8042, 
> containerResourceMemory1024, containerResourceVirtualCores1
> 19/08/30 15:22:14 INFO distributedshell.ApplicationMaster: Setting up 
> container launch container for 
> containerid=container_e101_1567140885858_0043_01_000002 with shellid=1
> 19/08/30 15:22:14 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
> START_CONTAINER for Container container_e101_1567140885858_0043_01_000002
> 19/08/30 15:22:14 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
> QUERY_CONTAINER for Container container_e101_1567140885858_0043_01_000002
> 19/08/30 15:22:15 INFO distributedshell.ApplicationMaster: Got response from 
> RM for container ask, completedCnt=1
> 19/08/30 15:22:15 ERROR distributedshell.ApplicationMaster: 
> appattempt_1567140885858_0043_000001 got container status for 
> containerID=container_e101_1567140885858_0043_01_000002, state=COMPLETE, 
> exitStatus=127, diagnostics=[2019-08-30 15:22:15.671]Exception from 
> container-launch.
> Container id: container_e101_1567140885858_0043_01_000002
> Exit code: 127
> Exception message: Launch container failed
> Shell output: main : command provided 4
> main : run as user is ambari-qa
> main : requested yarn user is ambari-qa
> 802b0a68c8332e819912e51eafc9527f382f48dbc91365bf5beb6ed54e14389c
> Creating script paths...
> Creating local dirs...
> Getting exit code file...
> Changing effective user to root...
> Inspecting docker container...
> Docker inspect command: /usr/bin/docker inspect --format {{.State.Pid}} 
> container_e101_1567140885858_0043_01_000002
> pid from docker inspect: 0
> Obtaining the exit code...
> Docker inspect command: /usr/bin/docker inspect --format {{.State.ExitCode}} 
> container_e101_1567140885858_0043_01_000002
> Exit code from docker inspect: 127
> Wrote the exit code 127 to 
> /hadoop/yarn/local/nmPrivate/application_1567140885858_0043/container_e101_1567140885858_0043_01_000002/container_e101_1567140885858_0043_01_000002.pid.exitcode
>
>
> [2019-08-30 15:22:15.672]Container exited with a non-zero exit code 127. Last 
> 4096 bytes of stderr.txt :
>
>
> [2019-08-30 15:22:15.673]Container exited with a non-zero exit code 127. Last 
> 4096 bytes of stderr.txt :
>
>
>
> 19/08/30 15:22:16 INFO distributedshell.ApplicationMaster: Application 
> completed. Stopping running containers
> 19/08/30 15:22:16 INFO distributedshell.ApplicationMaster: Application 
> completed. Signalling finished to RM
> 19/08/30 15:22:16 INFO impl.AMRMClientImpl: Waiting for application to be 
> successfully unregistered.
> 19/08/30 15:22:16 ERROR distributedshell.ApplicationMaster: Application 
> Master failed. exiting
>
>
> --
> Hiu Yen Onn
>
>
>

Reply via email to