On 2/14/19 11:09 PM, Vinay Kashyap wrote:
I am running hadoop on my mac and all the folders have *myuser:staff* as the owner. I have verified the permissions for the local dirs to be 755.

This doesn't sound right. By-the-book, there are supposed to be separate "users" for hdfs, yarn, and mapred to run their respective daemons. The directories they read/write in are supposed to be permed and owned to expect that. One possible approach for purposes of log-writing etc. is to put those user accounts in a group (perhaps named "hadoop") so that read/written areas in common are owned by that group and permed accordingly.

If you're going to ad-lib that arrangement then you'll have to ad-lib a lot of the rest of how worker nodes and edge nodes behave accordingly.

I run all hadoop services with myuser and I have configured /yarn.nodemanager.linux-container-executor.group/*=staff *accordingly both in *yarn-site.xml* and *container-executor.cfg*

1. Is the container-executor binary certified to work as expected on OSX.? 2. When linux container executor is configured, is there any hard expectation that users of the running hadoop services to be part of [*root, hdfs, yarn...*] and group to be *hadoop*.? So that the directory permissions fall in line accordingly?

Can you please help me understand this.? Could not find any write up on this.

On Thu, Feb 14, 2019 at 11:13 PM Prabhu Josephraj <[email protected] <mailto:[email protected]>> wrote:

    In case of Distributed Shell Job - ApplicationMaster runs in
    normal linux container and the subsequent shell command runs
    inside Docker
    container. The job fails even before launching AM, that is before
    starting Docker Container. I think the Distributed Shell job will
    fail even
    without Docker Settings.

    As per the error code 20 , it is mostly related to accessing of NM
    local directory.

    
https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_sg_yarn_container_exec_errors.html

    20

        

    INITIALIZE_USER_FAILED

        

    Couldn't get, stat, or secure the per-user NodeManager directory.


    Can we try below steps on (all) NodeManager machine.

    Remove all contents under /data/yarn and make sure the /data and
    /data/yarn directory permission is 755 with owner root:root and
    local directory
    is owned by yarn:hadoop.

    [root@tparimi-tarunhdp26-4 ~]# ls -lrt /
    drwxr-xr-x.   5 root root    44 Oct 24 11:47 data

    [root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/
    drwxr-xr-x. 4 root      root   28 Oct 24 14:30 yarn

    [root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/yarn/
    total 4
    drwxr-xr-x.  5 yarn hadoop   54 Feb 14 17:32 local
    drwxrwxr-x. 10 yarn hadoop 4096 Feb 14 17:32 log

    And also check if Distributed Shell jobs runs fine without Docker
    Settings.





    On Thu, Feb 14, 2019 at 10:15 PM Vinay Kashyap
    <[email protected] <mailto:[email protected]>> wrote:

        Hi Prabhu,

        Thanks for your reply.
        I tried the configurations as per your suggestion. But I get
        the same error.
        Is this related to container localization by any chance?.
        Also, is there any log or out information which says that the
        docker container runtime has been picked up.?



        On Thu, Feb 14, 2019 at 9:38 PM Prabhu Josephraj
        <[email protected] <mailto:[email protected]>> wrote:

            Hi Vinay,

                Can you try specifying below configs under Docker
            section in container-executor.cfg which will allow Docker
            Containers to use the NM Local Dirs.

            
docker.allowed.ro-mounts=/data/yarn/local,,/usr/jdk64/jdk1.8.0_112/bin
            docker.allowed.rw-mounts=/data/yarn/local,/data/yarn/log

            Thanks,
            Prabhu Joseph

            On Thu, Feb 14, 2019 at 9:28 PM Vinay Kashyap
            <[email protected] <mailto:[email protected]>> wrote:


                I am using Hadoop 3.2.0 and trying to run a simple
                application in a docker container and I have made the
                required configuration changes both in
                */yarn-site.xml/* and */container-executor.cfg/* to
                choose LinuxContainerExecutor and docker runtime.

                I use the example of distributed shell in one of the
                hortonworks blog.
                
https://hortonworks.com/blog/trying-containerized-applications-apache-hadoop-yarn-3-1/

                The problem I face here is when the application is
                submitted to YARN it fails with a reason related to
                directory creation issue with the below error

                    2019-02-14 20:51:16,450 INFO
                    distributedshell.Client: Got application report
                    from ASM for, appId=2, clientToAMToken=null,
                    appDiagnostics=Application
                    application_1550156488785_0002 failed 2 times due
                    to AM Container for
                    appattempt_1550156488785_0002_000002 exited with
                    exitCode: -1000 Failing this attempt.Diagnostics:
                    [2019-02-14 20:51:16.282]Application
                    application_1550156488785_0002 initialization
                    failed (exitCode=20) with output: main : command
                    provided 0 main : user is myuser main : requested
                    yarn user is myuser Failed to create directory
                    
/data/yarn/local/nmPrivate/container_1550156488785_0002_02_000001.tokens/usercache/myuser
                    - Not a directory

                I have configured *yarn.nodemanager.local-dirs* in
                yarn-site.xml and I can see the same reflected in YARN
                web ui *localhost:8088/conf*

                |<property> <name>yarn.nodemanager.local-dirs</name>
                <value>/data/yarn/local</value> <final>false</final>
                <source>yarn-site.xml</source> </property> |

                I do not understand why is it trying to create
                usercache dir inside the nmPrivate directory.

                Note : I have verified the permissions for myuser to
                the directories and also have tried clearing the
                directories manually as suggested in a related post.
                But no fruit. I do not see any additional information
                about container launch failure in any other logs.

                How do I debug why the usercache dir is not resolved
                properly??

                Really appreciate any help on this.

                Thanks

                Vinay Kashyap



-- */Thanks and regards/*
        */Vinay Kashyap/*



--
*/Thanks and regards/*
*/Vinay Kashyap/*


Reply via email to