Re: YARN Pi example job stuck at 0%(No MR tasks are started by ResourceManager)

anil gupta Fri, 27 Jul 2012 15:06:20 -0700

Hi Harsh,

I have set the *yarn.nodemanager.resource.memory-mb *to 1200 mb. Also, does
it matters if i run the jobs as "root" while the RM service and NM service
are running as "yarn" user? However, i have created the /user/root
directory for root user in hdfs.


Here is the yarn-site.xml:
<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce.shuffle</value>
  </property>

  <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>

  <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
  </property>

  <property>
    <description>List of directories to store localized files
in.</description>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/disk/yarn/local</value>
  </property>

  <property>
    <description>Where to store container logs.</description>
    <name>yarn.nodemanager.log-dirs</name>
    <value>/disk/yarn/logs</value>
  </property>

  <property>
    <description>Where to aggregate logs to.</description>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/var/log/hadoop-yarn/apps</value>
  </property>

  <property>
    <description>Classpath for typical applications.</description>
     <name>yarn.application.classpath</name>
     <value>
        $HADOOP_CONF_DIR,
        $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
        $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
        $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
        $YARN_HOME/*,$YARN_HOME/lib/*
     </value>
  </property>
<property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>ihub-an-l1:8025</value>
</property>
<property>
    <name>yarn.resourcemanager.address</name>
    <value>ihub-an-l1:8040</value>
</property>
<property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>ihub-an-l1:8030</value>
</property>
<property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>ihub-an-l1:8141</value>
</property>
<property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>ihub-an-l1:8088</value>
</property>
<property>
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>/disk/mapred/jobhistory/intermediate/done</value>
</property>
<property>
        <name>mapreduce.jobhistory.done-dir</name>
        <value>/disk/mapred/jobhistory/done</value>
</property>

<property>
        <name>yarn.web-proxy.address</name>
        <value>ihub-an-l1:9999</value>
</property>
<property>
    <name>yarn.app.mapreduce.am.staging-dir</name>
        <value>/user</value>
</property>

*<property>
    <description>Amount of physical memory, in MB, that can be allocated
          for containers.</description>
       <name>yarn.nodemanager.resource.memory-mb</name>
        <value>1200</value>
</property>*

</configuration>




On Fri, Jul 27, 2012 at 2:23 PM, Harsh J <[email protected]> wrote:

> Can you share your yarn-site.xml contents? Have you tweaked memory
> sizes in there?
>
> On Fri, Jul 27, 2012 at 11:53 PM, anil gupta <[email protected]>
> wrote:
> > Hi All,
> >
> > I have a Hadoop 2.0 alpha(cdh4)  hadoop/hbase cluster runnning on
> > CentOS6.0. The cluster has 4 admin nodes and 8 data nodes. I have the RM
> > and History server running on one machine. RM web interface shows that 8
> > Nodes are connected to it. I installed this cluster with HA capability
> and
> > I have already tested HA for Namenodes, ZK, HBase Master. I am running
> the
> > pi example mapreduce job with user "root" and i have created "/user/root"
> > directory in HDFS.
> >
> > Last few lines of one of the nodemanager:
> > 2012-07-26 21:58:38,745 INFO org.mortbay.log: Extract
> >
> jar:file:/usr/lib/hadoop-yarn/hadoop-yarn-common-2.0.0-cdh4.0.0.jar!/webapps/node
> > to /tmp/Jetty_0_0_0_0_8042_node____19tj0x/webapp
> > 2012-07-26 21:58:38,907 INFO org.mortbay.log: Started
> > [email protected]:8042
> > 2012-07-26 21:58:38,907 INFO org.apache.hadoop.yarn.webapp.WebApps: Web
> app
> > /node started at 8042
> > 2012-07-26 21:58:38,919 INFO org.apache.hadoop.yarn.webapp.WebApps:
> > Registered webapp guice modules
> > 2012-07-26 21:58:38,919 INFO
> > org.apache.hadoop.yarn.service.AbstractService:
> > Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is
> > started.
> > 2012-07-26 21:58:38,919 INFO
> > org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is
> > started.
> > 2012-07-26 21:58:38,922 INFO
> > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
> Connected
> > to ResourceManager at ihub-an-l1/172.31.192.151:8025
> > 2012-07-26 21:58:38,924 INFO
> > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
> Registered
> > with ResourceManager as ihub-dn-l2:53199 with total resource of memory:
> 1200
> > 2012-07-26 21:58:38,924 INFO
> > org.apache.hadoop.yarn.service.AbstractService:
> > Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl
> is
> > started.
> > 2012-07-26 21:58:38,929 INFO
> > org.apache.hadoop.yarn.service.AbstractService:
> > Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started.
> > *2012-07-26 21:58:38,929 INFO
> > org.apache.hadoop.yarn.service.AbstractService:
> > Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl
> is
> > stopped.*
> >
> > Why is the nodestatusupdaterImpl is stopped?
> >
> > Here is the last few lines of the RM:
> > 2012-07-27 09:38:24,644 INFO
> > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated
> > new applicationId: 2
> > 2012-07-27 09:38:25,310 INFO
> > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService:
> Application
> > with id 2 submitted by user root
> > 2012-07-27 09:38:25,310 INFO
> > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root
> > IP=172.31.192.51        OPERATION=Submit Application Request
> > TARGET=ClientRMService  RESULT=SUCCESS
>  APPID=application_1343365114818_0002
> > 2012-07-27 09:38:25,310 INFO
> > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> > application_1343365114818_0002 State change from NEW to SUBMITTED
> > 2012-07-27 09:38:25,311 INFO
> > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
> > Registering appattempt_1343365114818_0002_000001
> > 2012-07-27 09:38:25,311 INFO
> >
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> > appattempt_1343365114818_0002_000001 State change from NEW to SUBMITTED
> > 2012-07-27 09:38:25,311 INFO
> >
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler:
> > Application Submission: application_1343365114818_0002 from root,
> currently
> > active: 1
> > 2012-07-27 09:38:25,311 INFO
> >
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> > appattempt_1343365114818_0002_000001 State change from SUBMITTED to
> > SCHEDULED
> > 2012-07-27 09:38:25,311 INFO
> > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> > application_1343365114818_0002 State change from SUBMITTED to ACCEPTED
> >
> > The Pi example job is stuck from last 1 hour. Why it is not trying to
> start
> > tasks in NM's?
> >
> > Here is the command i fired to run the job:
> > [root@ihub-nn-a1 hadoop-yarn]# hadoop --config /etc/hadoop/conf/ jar
> > /usr/lib/hadoop-mapreduce/hadoop-*-examples.jar pi 10 100000
> > Number of Maps  = 10
> > Samples per Map = 100000
> > Wrote input for Map #0
> > Wrote input for Map #1
> > Wrote input for Map #2
> > Wrote input for Map #3
> > Wrote input for Map #4
> > Wrote input for Map #5
> > Wrote input for Map #6
> > Wrote input for Map #7
> > Wrote input for Map #8
> > Wrote input for Map #9
> > Starting Job
> > 12/07/27 09:38:27 INFO input.FileInputFormat: Total input paths to
> process
> > : 10
> > 12/07/27 09:38:27 INFO mapreduce.JobSubmitter: number of splits:10
> > 12/07/27 09:38:27 WARN conf.Configuration: mapred.jar is deprecated.
> > Instead, use mapreduce.job.jar
> > 12/07/27 09:38:27 WARN conf.Configuration:
> > mapred.map.tasks.speculative.execution is deprecated. Instead, use
> > mapreduce.map.speculative
> > 12/07/27 09:38:27 WARN conf.Configuration: mapred.reduce.tasks is
> > deprecated. Instead, use mapreduce.job.reduces
> > 12/07/27 09:38:27 WARN conf.Configuration: mapred.output.value.class is
> > deprecated. Instead, use mapreduce.job.output.value.class
> > 12/07/27 09:38:27 WARN conf.Configuration:
> > mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
> > mapreduce.reduce.speculative
> > 12/07/27 09:38:27 WARN conf.Configuration: mapreduce.map.class is
> > deprecated. Instead, use mapreduce.job.map.class
> > 12/07/27 09:38:27 WARN conf.Configuration: mapred.job.name is
> deprecated.
> > Instead, use mapreduce.job.name
> > 12/07/27 09:38:27 WARN conf.Configuration: mapreduce.reduce.class is
> > deprecated. Instead, use mapreduce.job.reduce.class
> > 12/07/27 09:38:27 WARN conf.Configuration: mapreduce.inputformat.class is
> > deprecated. Instead, use mapreduce.job.inputformat.class
> > 12/07/27 09:38:27 WARN conf.Configuration: mapred.input.dir is
> deprecated.
> > Instead, use mapreduce.input.fileinputformat.inputdir
> > 12/07/27 09:38:27 WARN conf.Configuration: mapred.output.dir is
> deprecated.
> > Instead, use mapreduce.output.fileoutputformat.outputdir
> > 12/07/27 09:38:27 WARN conf.Configuration: mapreduce.outputformat.class
> is
> > deprecated. Instead, use mapreduce.job.outputformat.class
> > 12/07/27 09:38:27 WARN conf.Configuration: mapred.map.tasks is
> deprecated.
> > Instead, use mapreduce.job.maps
> > 12/07/27 09:38:27 WARN conf.Configuration: mapred.output.key.class is
> > deprecated. Instead, use mapreduce.job.output.key.class
> > 12/07/27 09:38:27 WARN conf.Configuration: mapred.working.dir is
> > deprecated. Instead, use mapreduce.job.working.dir
> > 12/07/27 09:38:27 INFO mapred.ResourceMgrDelegate: Submitted application
> > application_1343365114818_0002 to ResourceManager at ihub-an-l1/
> > 172.31.192.151:8040
> > 12/07/27 09:38:27 INFO mapreduce.Job: The url to track the job:
> > http://ihub-an-l1:9999/proxy/application_1343365114818_0002/
> > 12/07/27 09:38:27 INFO mapreduce.Job: Running job: job_1343365114818_0002
> >
> > No Map-Reduce task are started by the cluster. I dont see any errors
> > anywhere in the application. Please help me in resolving this problem.
> >
> > Thanks,
> > Anil Gupta
>
>
>
> --
> Harsh J
>



-- 
Thanks & Regards,
Anil Gupta

Re: YARN Pi example job stuck at 0%(No MR tasks are started by ResourceManager)

Reply via email to