execution

Vinay Kashyap Mon, 25 Feb 2019 22:30:11 -0800

Thanks Zhankun for the clarification.
Also, is my understanding correct on --checkpoint_path as I mentioned
earlier in the thread..?? Quoting the comment again in this thread.


[There is another argument called *--checkpoint_path* which acts as a path
where all the outputs (models or datasets) which are resulted as part of
the execution of the worker code inside the docker container. Hence,
*--input_path* acts as entry point which will be localized and
*--checkpoint_path *acts as exit point, where both these paths are hdfs
paths which runs outside the docker container.]

Will continue my exercise with Submarine and would love to discuss more.



On Mon, Feb 25, 2019 at 4:21 PM zhankun tang <[email protected]> wrote:

> Hi Vinay,
>
> IIRC, YARN will have the host's Hadoop environments set in container
> launch script by default. And in the submarine case, the user's worker
> command is used to generate a worker script which is invoked in the
> container launch script. If submarine doesn't override the default Hadoop
> environment variable, the HDFS read/write in the container might fail due
> to not found or incorrect Hadoop location.
> So even a Docker image is built with correct Hadoop environment set, it
> seems also needs this override to use HDFS library in a container. This
> seems caused by YARN's Docker support and the submarine is doing a
> workaround here.
>
> The submarine is evolving rapidly, please share your thoughts if it's 
> uncomfortable
> for you.
>
> Thanks,
> Zhankun
>
> On Mon, 25 Feb 2019 at 12:22, Vinay Kashyap <[email protected]> wrote:
>
>> Hi Zhankun,
>> Thanks for the reply.
>>
>> Regarding Question 1 : Okay.. I understand, Let me try configuring
>> multiple input path place holders and refer the same in the worker launch
>> command.
>>
>> Regarding Question 2 :
>> What I did not understand is why YARN has to set anything related to
>> Hadoop which runs inside the container. The Hadoop environment and the
>> worker code to read the same is completely isolated to the docker
>> container. In that case, the worker scripts should know where the
>> HADOOP_HOME is inside the container right.? There is another argument
>> called *--checkpoint_path* which acts as a path where all the outputs
>> (models or datasets) which are resulted as part of the execution of the
>> worker code inside the docker container. Hence, *--input_path* acts as
>> entry point which will be localized and *--checkpoint_path *acts as exit
>> point, where both these paths are hdfs paths which runs outside the docker
>> container. So why YARN should know the hadoop configuration which is inside
>> the container.?
>>
>> Thanks and regards
>> Vinay Kashyap
>>
>> On Fri, Feb 22, 2019 at 7:39 PM zhankun tang <[email protected]>
>> wrote:
>>
>>> Hi Vinay,
>>>
>>> For question one, IIRC, we cannot set multiple "*--input_path" *flag at
>>> present. The "--input_path" is designed originally as a placeholder to
>>> store a path and then the path is used to replace "%input_path%" in worker
>>> command like "python worker.sh -input %input_path% ..".
>>> So from this perspective, you can directly append the other input paths
>>> to your worker command in your own way.
>>>
>>> For question two, because YARN might set a wrong HADOOP_COMMON_HOME by
>>> default. So submarine provides the environment variable to be set in the
>>> worker's launch script if the worker wants to access HDFS.
>>> And there's no data plane relation between outside Hadoop and the
>>> container except YARN will localize resources for the container.
>>>
>>> Hope this can answer your questions.
>>>
>>> Best Regards,
>>> Zhankun
>>>
>>> On Fri, 22 Feb 2019 at 15:35, Vinay Kashyap <[email protected]> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am using Hadoop 3.2.0. I am trying few examples using Submarine to
>>>> run TensorFlow jobs in a docker container.
>>>> I would like to understand few details regarding Read/Write HDFS data
>>>> during/after application launch/execution. Have highlighted the questions
>>>> line.
>>>>
>>>> When launching the application which reads input from HDFS, we
>>>> configure *--input_path* to a hdfs path, as mentioned in the standard
>>>> example.
>>>>
>>>> yarn jar hadoop-yarn-applications-submarine-<version>.jar job run \
>>>>  --name tf-job-001 --docker_image <your docker image> \
>>>>  --input_path hdfs://default/dataset/cifar-10-data \
>>>>  --checkpoint_path hdfs://default/tmp/cifar-10-jobdir \
>>>>  --env DOCKER_JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/ \
>>>>  --env DOCKER_HADOOP_HDFS_HOME=/hadoop-3.1.0 \
>>>>  --num_workers 2 \
>>>>  --worker_resources memory=8G,vcores=2,gpu=1 --worker_launch_cmd "cmd
>>>> for worker ..." \
>>>>  --num_ps 2 \
>>>>  --ps_resources memory=4G,vcores=2,gpu=0 --ps_launch_cmd "cmd for ps" \
>>>>
>>>> *Question 1 : What if I have more than 1 dataset in a separate HDFS
>>>> paths? Can --input_path take multiple paths in any fashion or is it
>>>> expected to maintain all the datasets under one path.?*
>>>>
>>>> "DOCKER_JAVA_HOME points to JAVA_HOME inside Docker image"
>>>> and "DOCKER_HADOOP_HDFS_HOME points to HADOOP_HDFS_HOME inside Docker
>>>> image".
>>>>
>>>> *Question 2 : What is the exact expectation here.? In the sense, is
>>>> there any relation/connection with the Hadoop running outside the docker
>>>> container.? I guess read HDFS data into the docker container happens during
>>>> Container localization, but how does output data write back happens to HDFS
>>>> running outside the docker container.?*
>>>>
>>>> Assuming a scenario where Application 1 creates a model and Application
>>>> 2 performs scoring. Both the applications run in a separate docker
>>>> containers. I would like the understand how does the data read and write
>>>> across applications happen in this case.
>>>> Would be of great help if anyone can be guide me understanding this or
>>>> direct me to a blog or write up which explains the above.
>>>>
>>>> *Thanks and regards*
>>>> *Vinay Kashyap*
>>>>
>>>
>>
>> --
>> *Thanks and regards*
>> *Vinay Kashyap*
>>
>

-- 
*Thanks and regards*
*Vinay Kashyap*

Re: Hadoop 3.2.0 {Submarine} : Understanding HDFS data Read/Write during/after application launch/execution

Reply via email to