Hi everybody

I'm trying to run the wordcount example with a 1000 copies of a gutenberg file downloaded and extracted from: http://www.gutenberg.lib.md.us/1/0/0/0/10001/10001.zip, e.g. like:

$ cd /home/input
$ for i in `seq 1 999`; do cp 0.txt $i.txt; done
$ start-dfs.sh
$ hdfs namenode -format
$ hdfs dfs -mkdir /user
$ hdfs dfs -mkdir /user/hduser
$ hdfs dfs -put /home/input/ input

As you can see from http://hdmaster:50070/dfshealth.html#tab-datanode the multiplied gutenburg file has been successfully uploaded 1000x times to hdfs:

Node Last contact Admin State Capacity Used Non DFS Used Remaining Blocks Block pool used Failed Volumes Version hdmaster (10.10.10.10:50010) 1 In Service 238.32 GB 56.11 MB 48.73 GB 189.53 GB 1000 56.11 MB (0.02%) 0 2.6.4

When I run the wordcount example without yarn in a pseudo cluster, it works (and fast):

$ hadoop jar ./hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar wordcount input output

16/09/04 18:18:29 INFO mapred.LocalJobRunner: 1000 / 1000 copied.
16/09/04 18:18:29 INFO reduce.MergeManagerImpl: finalMerge called with 1000 in-memory map-outputs and 0 on-disk map-outputs
16/09/04 18:18:29 INFO mapred.Merger: Merging 1000 sorted segments
16/09/04 18:18:29 INFO mapred.Merger: Down to the last merge-pass, with 1000 segments left of total size: 40716000 bytes 16/09/04 18:18:31 INFO reduce.MergeManagerImpl: Merged 1000 segments, 40722000 bytes to disk to satisfy reduce memory limit 16/09/04 18:18:31 INFO reduce.MergeManagerImpl: Merging 1 files, 40720006 bytes from disk 16/09/04 18:18:31 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
16/09/04 18:18:31 INFO mapred.Merger: Merging 1 sorted segments
16/09/04 18:18:31 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 40719996 bytes
16/09/04 18:18:31 INFO mapred.LocalJobRunner: 1000 / 1000 copied.
16/09/04 18:18:31 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords 16/09/04 18:18:33 INFO mapred.Task: Task:attempt_local1736492896_0001_r_000000_0 is done. And is in the process of committing
16/09/04 18:18:33 INFO mapred.LocalJobRunner: 1000 / 1000 copied.
16/09/04 18:18:33 INFO mapred.Task: Task attempt_local1736492896_0001_r_000000_0 is allowed to commit now 16/09/04 18:18:33 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1736492896_0001_r_000000_0' to hdfs://hdmaster:54310/user/hduser/output/_temporary/0/task_local1736492896_0001_r_000000
16/09/04 18:18:33 INFO mapred.LocalJobRunner: reduce > reduce
16/09/04 18:18:33 INFO mapred.Task: Task 'attempt_local1736492896_0001_r_000000_0' done. 16/09/04 18:18:33 INFO mapred.LocalJobRunner: Finishing task: attempt_local1736492896_0001_r_000000_0 16/09/04 18:18:33 INFO mapred.LocalJobRunner: reduce task executor complete.
16/09/04 18:18:33 INFO mapreduce.Job:  map 100% reduce 100%
16/09/04 18:18:33 INFO mapreduce.Job: Job job_local1736492896_0001 completed successfully
16/09/04 18:18:33 INFO mapreduce.Job: Counters: 38
        File System Counters
                FILE: Number of bytes read=2865334368
                FILE: Number of bytes written=21137755248
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=26333765000
                HDFS: Number of bytes written=37800
                HDFS: Number of read operations=1007007
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=1003
        Map-Reduce Framework
                Map input records=958000
                Map output records=8807000
                Map output bytes=85735000
                Map output materialized bytes=40726000
                Input split bytes=111890
                Combine input records=8807000
                Combine output records=3035000
                Reduce input groups=3035
                Reduce shuffle bytes=40726000
                Reduce input records=3035000
                Reduce output records=3035
                Spilled Records=6070000
                Shuffled Maps =1000
                Failed Shuffles=0
                Merged Map outputs=1000
                GC time elapsed (ms)=6358
                CPU time spent (ms)=0
                Physical memory (bytes) snapshot=0
                Virtual memory (bytes) snapshot=0
                Total committed heap usage (bytes)=1586989891584
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=52510000
        File Output Format Counters
                Bytes Written=37800
90.20user 2.64system 1:01.70elapsed 150%CPU (0avgtext+0avgdata 1821840maxresident)k
0inputs+170688outputs (0major+61251minor)pagefaults 0swaps


When I add the minimal settings to mapred-site.xml to run the program in a pseudo distributed yarn cluster:

mapred-site.xml:

   <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
<description>The framework for running mapreduce jobs</description>
   </property>

and yarn-site.xml:

  <property>
     <name>yarn.nodemanager.aux-services</name>
     <value>mapreduce_shuffle</value>
  </property>

the wordcount program randomly exits and never finishes. I've tried different hadoop versions (2.6.4, 2.7.2, 2.7.3) and an up to date Ubuntu 16.04 release and an up to date Gentoo Linux system. I've tried with Oracle JDK 1.7 and 1.8 to no avail.

I cannot figure out how to debug the problem, I've now spent days trying to investigate the problem.

$ hadoop jar ./hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar wordcount input output

Below the output after about 190 reduce operations. As you can see the whole cluster comes down (login shell gets terminated as well). Sometimes the cluster is terminated after a handful of operations already.

As you can see from the dstat output while executing the wordcount example the memory is not a problem:

$ dstat -a -m

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ------memory-usage----- usr sys idl wai hiq siq| read writ| recv send| in out | int csw | used buff cach free 97 3 0 0 0 0| 0 1316k| 18k 776B| 0 0 |3284 10k|8396M 119M 4338M 18.7G 96 4 0 0 0 0| 0 0 | 241B 152B| 0 0 |3182 11k|8705M 119M 4339M 18.4G 96 4 0 0 0 0| 0 936k| 582B 316B| 0 0 |3288 15k|8251M 119M 4339M 18.9G 96 4 0 0 0 0| 52k 628k| 518B 316B| 0 0 |3141 10k|8326M 119M 4339M 18.8G 97 3 0 0 0 0| 0 32k| 652B 450B| 0 0 |3081 9674 |8490M 119M 4339M 18.7G 97 3 0 0 0 0| 52k 1372k| 393B 292B| 0 0 |3115 12k|8514M 119M 4340M 18.6G 96 4 0 0 0 0| 0 184k| 0 70B| 0 0 |3107 11k|8291M 119M 4340M 18.8G 97 3 0 0 0 0| 0 72k| 847B 474B| 0 0 |3156 8630 |8217M 119M 4344M 18.9G 97 3 0 0 0 0| 0 144k| 546B 298B| 0 0 |3176 7967 |8048M 119M 4345M 19.1G 97 3 0 0 0 0| 0 836k| 399B 298B| 0 0 |3228 11k|8231M 119M 4345M 18.9G 97 3 0 0 0 0| 0 1448k| 70B 70B| 0 0 |3096 16k|8469M 119M 4345M 18.7G 96 4 0 0 0 0| 0 88k| 259B 228B| 0 0 |3197 13k|8387M 119M 4346M 18.7G 97 3 0 0 0 0| 0 144k| 917B 544B| 0 0 |3044 7261 |8293M 119M 4346M 18.8G 96 4 0 0 0 0| 0 72k| 259B 158B| 0 0 |3231 12k|8342M 119M 4346M 18.8G 97 3 0 0 0 0| 0 828k| 518B 316B| 0 0 |3091 9285 |8828M 119M 4347M 18.3G 97 3 0 0 0 0| 0 2884k| 0 0 | 0 0 |3082 11k|8511M 119M 4347M 18.6G 96 4 0 0 0 0| 0 188k| 259B 158B| 0 0 |3105 24k|8303M 119M 4347M 18.8G 96 4 0 0 0 0| 0 184k| 518B 316B| 0 0 |3194 9256 |8161M 119M 4347M 19.0G 97 3 0 0 0 0| 52k 40k| 259B 158B| 0 0 |3014 8124 |8503M 119M 4348M 18.6G 97 3 0 0 0 0| 52k 812k| 0 0 | 0 0 |3217 23k|8531M 119M 4348M 18.6G 95 4 0 0 0 0| 0 1692k| 329B 428B| 0 0 |3345 18k|7954M 119M 4348M 19.2G 17 5 78 0 0 0| 0 616k| 288B 213B| 0 0 |2349 11k|3462M 119M 4347M 23.6G 0 0 99 0 0 0| 0 1012k| 0 0 | 0 0 | 290 640 |3462M 119M 4347M 23.6G 0 0 100 0 0 0| 0 0 | 0 0 | 0 0 | 340 979 |3462M 119M 4347M 23.6G


Here's the excerpt from yarn-hduser-nodemanager-hdmaster.log:

5642 2016-09-04 17:22:44,012 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1473002371122_0001_01_000194 by user hduser 5643 2016-09-04 17:22:44,012 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hduser IP=10.10.10.10 OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1473002371122_0001 CONTAINERID=container_1473002371122_0001_01_000194 5644 2016-09-04 17:22:44,015 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1473002371122_0001_01_000194 to application application_1473002371122_0001 5645 2016-09-04 17:22:44,016 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000194 transitioned from NEW to LOCALIZING 5646 2016-09-04 17:22:44,016 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for appId application_1473002371122_0001 5647 2016-09-04 17:22:44,016 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event APPLICATION_INIT for appId application_1473002371122_0001 5648 2016-09-04 17:22:44,016 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got APPLICATION_INIT for service mapreduce_shuffle 5649 2016-09-04 17:22:44,016 INFO org.apache.hadoop.mapred.ShuffleHandler: Added token for job_1473002371122_0001 5650 2016-09-04 17:22:44,016 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000194 transitioned from LOCALIZING to LOCALIZED 5651 2016-09-04 17:22:44,032 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1473002371122_0001_01_000193 by user hduser 5652 2016-09-04 17:22:44,036 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hduser IP=10.10.10.10 OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1473002371122_0001 CONTAINERID=container_1473002371122_0001_01_000193 5653 2016-09-04 17:22:44,037 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1473002371122_0001_01_000193 to application application_1473002371122_0001 5654 2016-09-04 17:22:44,037 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000193 transitioned from NEW to LOCALIZING 5655 2016-09-04 17:22:44,037 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for appId application_1473002371122_0001 5656 2016-09-04 17:22:44,037 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event APPLICATION_INIT for appId application_1473002371122_0001 5657 2016-09-04 17:22:44,037 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got APPLICATION_INIT for service mapreduce_shuffle 5658 2016-09-04 17:22:44,037 INFO org.apache.hadoop.mapred.ShuffleHandler: Added token for job_1473002371122_0001 5659 2016-09-04 17:22:44,037 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000193 transitioned from LOCALIZING to LOCALIZED 5660 2016-09-04 17:22:44,097 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000194 transitioned from LOCALIZED to RUNNING 5661 2016-09-04 17:22:44,112 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /home/hduser/tmp/nm-local-dir/usercache/hduser/appcache/application_1473002371122_0001/container_1473002371122_0001_01_000194/default_container_executor.sh] 5662 2016-09-04 17:22:44,118 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 1354 for container-id container_1473002371122_0001_01_000179: 167.5 MB of 1 GB physical memory used; 683.8 MB of 6 GB virtual memory used 5663 2016-09-04 17:22:44,188 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000193 transitioned from LOCALIZED to RUNNING 5664 2016-09-04 17:22:44,191 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /home/hduser/tmp/nm-local-dir/usercache/hduser/appcache/application_1473002371122_0001/container_1473002371122_0001_01_000193/default_container_executor.sh] 5665 2016-09-04 17:22:44,212 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1473002371122_0001_000001 (auth:SIMPLE) 5666 2016-09-04 17:22:44,213 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Stopping container with container Id: container_1473002371122_0001_01_000174 5667 2016-09-04 17:22:44,213 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hduser IP=10.10.10.10 OPERATION=Stop Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1473002371122_0001 CONTAINERID=container_1473002371122_0001_01_000174 5668 2016-09-04 17:22:44,218 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000174 transitioned from RUNNING to KILLING 5669 2016-09-04 17:22:44,218 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1473002371122_0001_01_000174 5670 2016-09-04 17:22:44,245 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473002371122_0001_01_000181 is : 143 5671 2016-09-04 17:22:44,246 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473002371122_0001_01_000183 is : 143 5672 2016-09-04 17:22:44,246 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473002371122_0001_01_000189 is : 143 5673 2016-09-04 17:22:44,246 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473002371122_0001_01_000191 is : 143 5674 2016-09-04 17:22:44,247 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473002371122_0001_01_000190 is : 143 5675 2016-09-04 17:22:44,247 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473002371122_0001_01_000182 is : 143 5676 2016-09-04 17:22:44,248 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473002371122_0001_01_000186 is : 143 5677 2016-09-04 17:22:44,252 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473002371122_0001_01_000192 is : 143 5678 2016-09-04 17:22:44,255 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473002371122_0001_01_000180 is : 143 5679 2016-09-04 17:22:44,255 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473002371122_0001_01_000187 is : 143 5680 2016-09-04 17:22:44,270 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473002371122_0001_01_000185 is : 143 5681 2016-09-04 17:22:44,281 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473002371122_0001_01_000184 is : 143 5682 2016-09-04 17:22:44,281 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473002371122_0001_01_000188 is : 143 5683 2016-09-04 17:22:44,281 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473002371122_0001_01_000179 is : 143 5684 2016-09-04 17:22:44,282 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473002371122_0001_01_000001 is : 143 5685 2016-09-04 17:22:44,282 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473002371122_0001_01_000194 is : 143 5686 2016-09-04 17:22:44,296 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473002371122_0001_01_000178 is : 143 5687 2016-09-04 17:22:44,296 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473002371122_0001_01_000175 is : 143 5688 2016-09-04 17:22:44,307 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473002371122_0001_01_000176 is : 143 5689 2016-09-04 17:22:44,308 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473002371122_0001_01_000174 is : 143 5690 2016-09-04 17:22:44,316 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473002371122_0001_01_000177 is : 143 5691 2016-09-04 17:22:44,323 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM 5692 2016-09-04 17:22:44,348 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473002371122_0001_01_000076 is : 143 5693 2016-09-04 17:22:44,383 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000181 transitioned from RUNNING to EXITED_WITH_FAILURE 5694 2016-09-04 17:22:44,383 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000189 transitioned from RUNNING to EXITED_WITH_FAILURE 5695 2016-09-04 17:22:44,383 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000183 transitioned from RUNNING to EXITED_WITH_FAILURE 5696 2016-09-04 17:22:44,383 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000191 transitioned from RUNNING to EXITED_WITH_FAILURE 5697 2016-09-04 17:22:44,383 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000190 transitioned from RUNNING to EXITED_WITH_FAILURE 5698 2016-09-04 17:22:44,383 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000182 transitioned from RUNNING to EXITED_WITH_FAILURE 5699 2016-09-04 17:22:44,384 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000186 transitioned from RUNNING to EXITED_WITH_FAILURE 5700 2016-09-04 17:22:44,384 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000192 transitioned from RUNNING to EXITED_WITH_FAILURE 5701 2016-09-04 17:22:44,384 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000180 transitioned from RUNNING to EXITED_WITH_FAILURE 5702 2016-09-04 17:22:44,384 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000187 transitioned from RUNNING to EXITED_WITH_FAILURE 5703 2016-09-04 17:22:44,384 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000185 transitioned from RUNNING to EXITED_WITH_FAILURE 5704 2016-09-04 17:22:44,384 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000184 transitioned from RUNNING to EXITED_WITH_FAILURE 5705 2016-09-04 17:22:44,384 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000188 transitioned from RUNNING to EXITED_WITH_FAILURE 5706 2016-09-04 17:22:44,384 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000179 transitioned from RUNNING to EXITED_WITH_FAILURE 5707 2016-09-04 17:22:44,384 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE 5708 2016-09-04 17:22:44,384 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000194 transitioned from RUNNING to EXITED_WITH_FAILURE 5709 2016-09-04 17:22:44,384 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000178 transitioned from RUNNING to EXITED_WITH_FAILURE 5710 2016-09-04 17:22:44,384 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000175 transitioned from RUNNING to EXITED_WITH_FAILURE 5711 2016-09-04 17:22:44,384 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000176 transitioned from RUNNING to EXITED_WITH_FAILURE 5712 2016-09-04 17:22:44,385 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000174 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL 5713 2016-09-04 17:22:44,385 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000177 transitioned from RUNNING to EXITED_WITH_FAILURE 5714 2016-09-04 17:22:44,385 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1473002371122_0001_01_000076 transitioned from RUNNING to EXITED_WITH_FAILURE 5715 2016-09-04 17:22:44,385 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1473002371122_0001_01_000181 5716 2016-09-04 17:22:44,389 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473002371122_0001_01_000193 is : 143 5717 2016-09-04 17:22:44,394 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 1579 for container-id container_1473002371122_0001_01_000183: 135.3 MB of 1 GB physical memory used; 682.6 MB of 6 GB virtual memory used 5718 2016-09-04 17:22:44,399 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM 5719 2016-09-04 17:22:44,407 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 1722 for container-id container_1473002371122_0001_01_000186: 0B of 1 GB physical memory used; 0B of 6 GB virtual memory used 5720 2016-09-04 17:22:44,414 INFO org.mortbay.log: Stopped [email protected]:8042 5721 2016-09-04 17:22:44,415 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1473002371122_0001_01_000189 5722 2016-09-04 17:22:44,420 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM 5723 2016-09-04 17:22:44,420 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 1265 for container-id container_1473002371122_0001_01_000176: 0B of 1 GB physical memory used; 0B of 6 GB virtual memory used


Here's the output from running:

$ hadoop org.apache.hadoop.conf.Configuration


<?xml version="1.0" encoding="UTF-8" standalone="no"?><configuration>
<property><name>ha.failover-controller.cli-check.rpc-timeout.ms</name><value>20000</value><source>core-default.xml</source></property>
<property><name>ipc.client.connect.max.retries.on.timeouts</name><value>45</value><source>core-default.xml</source></property>
<property><name>hadoop.user.group.static.mapping.overrides</name><value>dr.who=;</value><source>core-default.xml</source></property>
<property><name>hadoop.tmp.dir</name><value>/home/hduser/tmp</value><source>core-site.xml</source></property>
<property><name>hadoop.security.java.secure.random.algorithm</name><value>SHA1PRNG</value><source>core-default.xml</source></property>
<property><name>nfs.exports.allowed.hosts</name><value>* rw</value><source>core-default.xml</source></property>
<property><name>ha.health-monitor.check-interval.ms</name><value>1000</value><source>core-default.xml</source></property>
<property><name>ipc.client.idlethreshold</name><value>4000</value><source>core-default.xml</source></property>
<property><name>fs.trash.checkpoint.interval</name><value>0</value><source>core-default.xml</source></property>
<property><name>io.skip.checksum.errors</name><value>false</value><source>core-default.xml</source></property>
<property><name>hadoop.security.groups.negative-cache.secs</name><value>30</value><source>core-default.xml</source></property>
<property><name>fs.har.impl.disable.cache</name><value>true</value><source>core-default.xml</source></property>
<property><name>fs.defaultFS</name><value>hdfs://hdmaster:54310</value><source>core-site.xml</source></property>
<property><name>fs.client.resolve.remote.symlinks</name><value>true</value><source>core-default.xml</source></property>
<property><name>hadoop.rpc.socket.factory.class.default</name><value>org.apache.hadoop.net.StandardSocketFactory</value><source>core-default.xml</source></property>
<property><name>hadoop.security.kms.client.authentication.retry-count</name><value>1</value><source>core-default.xml</source></property>
<property><name>io.mapfile.bloom.size</name><value>1048576</value><source>core-default.xml</source></property>
<property><name>hadoop.rpc.protection</name><value>authentication</value><source>core-default.xml</source></property>
<property><name>net.topology.impl</name><value>org.apache.hadoop.net.NetworkTopology</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.require.client.cert</name><value>false</value><source>core-default.xml</source></property>
<property><name>io.bytes.per.checksum</name><value>512</value><source>core-default.xml</source></property>
<property><name>file.stream-buffer-size</name><value>4096</value><source>core-default.xml</source></property>
<property><name>ha.failover-controller.new-active.rpc-timeout.ms</name><value>60000</value><source>core-default.xml</source></property>
<property><name>ha.zookeeper.acl</name><value>world:anyone:rwcda</value><source>core-default.xml</source></property>
<property><name>fs.ftp.host.port</name><value>21</value><source>core-default.xml</source></property>
<property><name>hadoop.security.group.mapping</name><value>org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.keystores.factory.class</name><value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value><source>core-default.xml</source></property>
<property><name>s3.replication</name><value>3</value><source>core-default.xml</source></property>
<property><name>net.topology.node.switch.mapping.impl</name><value>org.apache.hadoop.net.ScriptBasedMapping</value><source>core-default.xml</source></property>
<property><name>fs.s3.buffer.dir</name><value>${hadoop.tmp.dir}/s3</value><source>core-default.xml</source></property>
<property><name>s3native.bytes-per-checksum</name><value>512</value><source>core-default.xml</source></property>
<property><name>fs.s3a.multipart.purge</name><value>false</value><source>core-default.xml</source></property>
<property><name>s3.client-write-packet-size</name><value>65536</value><source>core-default.xml</source></property>
<property><name>io.mapfile.bloom.error.rate</name><value>0.005</value><source>core-default.xml</source></property>
<property><name>ftp.bytes-per-checksum</name><value>512</value><source>core-default.xml</source></property>
<property><name>hadoop.security.group.mapping.ldap.search.attr.group.name</name><value>cn</value><source>core-default.xml</source></property>
<property><name>ha.health-monitor.rpc-timeout.ms</name><value>45000</value><source>core-default.xml</source></property>
<property><name>hadoop.security.authorization</name><value>false</value><source>core-default.xml</source></property>
<property><name>s3.bytes-per-checksum</name><value>512</value><source>core-default.xml</source></property>
<property><name>fs.s3n.multipart.uploads.block.size</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>ipc.client.fallback-to-simple-auth-allowed</name><value>false</value><source>core-default.xml</source></property>
<property><name>ipc.server.listen.queue.size</name><value>128</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.enabled.protocols</name><value>TLSv1</value><source>core-default.xml</source></property>
<property><name>hadoop.security.kms.client.encrypted.key.cache.low-watermark</name><value>0.3f</value><source>core-default.xml</source></property>
<property><name>s3native.blocksize</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>file.replication</name><value>1</value><source>core-default.xml</source></property>
<property><name>ftp.client-write-packet-size</name><value>65536</value><source>core-default.xml</source></property>
<property><name>hadoop.work.around.non.threadsafe.getpwuid</name><value>false</value><source>core-default.xml</source></property>
<property><name>fs.du.interval</name><value>600000</value><source>core-default.xml</source></property>
<property><name>hadoop.http.authentication.type</name><value>simple</value><source>core-default.xml</source></property>
<property><name>hadoop.http.staticuser.user</name><value>dr.who</value><source>core-default.xml</source></property>
<property><name>hadoop.util.hash.type</name><value>murmur</value><source>core-default.xml</source></property>
<property><name>hadoop.security.instrumentation.requires.admin</name><value>false</value><source>core-default.xml</source></property>
<property><name>hadoop.security.kms.client.encrypted.key.cache.size</name><value>500</value><source>core-default.xml</source></property>
<property><name>fs.s3a.connection.maximum</name><value>15</value><source>core-default.xml</source></property>
<property><name>fs.s3a.attempts.maximum</name><value>10</value><source>core-default.xml</source></property>
<property><name>io.map.index.interval</name><value>128</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.client.conf</name><value>ssl-client.xml</value><source>core-default.xml</source></property>
<property><name>hadoop.security.kms.client.encrypted.key.cache.expiry</name><value>43200000</value><source>core-default.xml</source></property>
<property><name>hadoop.kerberos.kinit.command</name><value>kinit</value><source>core-default.xml</source></property>
<property><name>fs.AbstractFileSystem.hdfs.impl</name><value>org.apache.hadoop.fs.Hdfs</value><source>core-default.xml</source></property>
<property><name>io.map.index.skip</name><value>0</value><source>core-default.xml</source></property>
<property><name>hadoop.http.authentication.token.validity</name><value>36000</value><source>core-default.xml</source></property>
<property><name>hadoop.jetty.logs.serve.aliases</name><value>true</value><source>core-default.xml</source></property>
<property><name>ftp.replication</name><value>3</value><source>core-default.xml</source></property>
<property><name>io.compression.codec.bzip2.library</name><value>system-native</value><source>core-default.xml</source></property>
<property><name>ha.failover-controller.graceful-fence.connection.retries</name><value>1</value><source>core-default.xml</source></property>
<property><name>fs.swift.impl</name><value>org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem</value><source>core-default.xml</source></property>
<property><name>ha.health-monitor.sleep-after-disconnect.ms</name><value>1000</value><source>core-default.xml</source></property>
<property><name>fs.s3a.connection.timeout</name><value>5000</value><source>core-default.xml</source></property>
<property><name>ipc.client.rpc-timeout.ms</name><value>0</value><source>core-default.xml</source></property>
<property><name>file.client-write-packet-size</name><value>65536</value><source>core-default.xml</source></property>
<property><name>fs.AbstractFileSystem.viewfs.impl</name><value>org.apache.hadoop.fs.viewfs.ViewFs</value><source>core-default.xml</source></property>
<property><name>hadoop.security.group.mapping.ldap.search.filter.group</name><value>(objectClass=group)</value><source>core-default.xml</source></property>
<property><name>hadoop.security.crypto.codec.classes.aes.ctr.nopadding</name><value>org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec,org.apache.hadoop.crypto.JceAesCtrCryptoCodec</value><source>core-default.xml</source></property>
<property><name>fs.s3n.block.size</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>hadoop.security.crypto.cipher.suite</name><value>AES/CTR/NoPadding</value><source>core-default.xml</source></property>
<property><name>net.topology.script.number.args</name><value>100</value><source>core-default.xml</source></property>
<property><name>dfs.ha.fencing.ssh.connect-timeout</name><value>30000</value><source>core-default.xml</source></property>
<property><name>hadoop.security.authentication</name><value>simple</value><source>core-default.xml</source></property>
<property><name>tfile.fs.output.buffer.size</name><value>262144</value><source>core-default.xml</source></property>
<property><name>hadoop.security.groups.cache.secs</name><value>300</value><source>core-default.xml</source></property>
<property><name>ha.failover-controller.graceful-fence.rpc-timeout.ms</name><value>5000</value><source>core-default.xml</source></property>
<property><name>fs.AbstractFileSystem.file.impl</name><value>org.apache.hadoop.fs.local.LocalFs</value><source>core-default.xml</source></property>
<property><name>fs.s3a.impl</name><value>org.apache.hadoop.fs.s3a.S3AFileSystem</value><source>core-default.xml</source></property>
<property><name>ha.health-monitor.connect-retry-interval.ms</name><value>1000</value><source>core-default.xml</source></property>
<property><name>fs.s3a.multipart.threshold</name><value>2147483647</value><source>core-default.xml</source></property>
<property><name>fs.s3.maxRetries</name><value>4</value><source>core-default.xml</source></property>
<property><name>fs.s3n.multipart.uploads.enabled</name><value>false</value><source>core-default.xml</source></property>
<property><name>hadoop.security.group.mapping.ldap.directory.search.timeout</name><value>10000</value><source>core-default.xml</source></property>
<property><name>file.blocksize</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>fs.ftp.host</name><value>0.0.0.0</value><source>core-default.xml</source></property>
<property><name>file.bytes-per-checksum</name><value>512</value><source>core-default.xml</source></property>
<property><name>ha.zookeeper.parent-znode</name><value>/hadoop-ha</value><source>core-default.xml</source></property>
<property><name>fs.s3a.multipart.size</name><value>104857600</value><source>core-default.xml</source></property>
<property><name>fs.s3a.multipart.purge.age</name><value>86400</value><source>core-default.xml</source></property>
<property><name>fs.s3n.multipart.copy.block.size</name><value>5368709120</value><source>core-default.xml</source></property>
<property><name>fs.trash.interval</name><value>0</value><source>core-default.xml</source></property>
<property><name>fs.s3.sleepTimeSeconds</name><value>10</value><source>core-default.xml</source></property>
<property><name>rpc.metrics.quantile.enable</name><value>false</value><source>core-default.xml</source></property>
<property><name>ftp.stream-buffer-size</name><value>4096</value><source>core-default.xml</source></property>
<property><name>hadoop.http.authentication.signature.secret.file</name><value>${user.home}/hadoop-http-auth-signature-secret</value><source>core-default.xml</source></property>
<property><name>io.seqfile.sorter.recordlimit</name><value>1000000</value><source>core-default.xml</source></property>
<property><name>s3.blocksize</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>fs.permissions.umask-mode</name><value>022</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.server.conf</name><value>ssl-server.xml</value><source>core-default.xml</source></property>
<property><name>fs.s3a.connection.ssl.enabled</name><value>true</value><source>core-default.xml</source></property>
<property><name>fs.s3a.buffer.dir</name><value>${hadoop.tmp.dir}/s3a</value><source>core-default.xml</source></property>
<property><name>s3native.stream-buffer-size</name><value>4096</value><source>core-default.xml</source></property>
<property><name>hadoop.security.groups.cache.warn.after.ms</name><value>5000</value><source>core-default.xml</source></property>
<property><name>hadoop.http.authentication.kerberos.principal</name><value>HTTP/_HOST@LOCALHOST</value><source>core-default.xml</source></property>
<property><name>hadoop.security.kms.client.encrypted.key.cache.num.refill.threads</name><value>2</value><source>core-default.xml</source></property>
<property><name>hadoop.security.group.mapping.ldap.search.filter.user</name><value>(&amp;(objectClass=user)(sAMAccountName={0}))</value><source>core-default.xml</source></property>
<property><name>fs.automatic.close</name><value>true</value><source>core-default.xml</source></property>
<property><name>ipc.client.connect.retry.interval</name><value>1000</value><source>core-default.xml</source></property>
<property><name>fs.s3a.paging.maximum</name><value>5000</value><source>core-default.xml</source></property>
<property><name>s3.stream-buffer-size</name><value>4096</value><source>core-default.xml</source></property>
<property><name>ha.zookeeper.session-timeout.ms</name><value>5000</value><source>core-default.xml</source></property>
<property><name>fs.AbstractFileSystem.har.impl</name><value>org.apache.hadoop.fs.HarFs</value><source>core-default.xml</source></property>
<property><name>io.seqfile.compress.blocksize</name><value>1000000</value><source>core-default.xml</source></property>
<property><name>hadoop.http.filter.initializers</name><value>org.apache.hadoop.http.lib.StaticUserWebFilter</value><source>core-default.xml</source></property>
<property><name>fs.s3.block.size</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>hadoop.http.authentication.simple.anonymous.allowed</name><value>true</value><source>core-default.xml</source></property>
<property><name>ftp.blocksize</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>io.seqfile.lazydecompress</name><value>true</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.enabled</name><value>false</value><source>core-default.xml</source></property>
<property><name>hadoop.common.configuration.version</name><value>0.23.0</value><source>core-default.xml</source></property>
<property><name>hadoop.security.group.mapping.ldap.search.attr.member</name><value>member</value><source>core-default.xml</source></property>
<property><name>hadoop.security.random.device.file.path</name><value>/dev/urandom</value><source>core-default.xml</source></property>
<property><name>ipc.client.connection.maxidletime</name><value>10000</value><source>core-default.xml</source></property>
<property><name>ipc.client.connect.timeout</name><value>20000</value><source>core-default.xml</source></property>
<property><name>hadoop.security.uid.cache.secs</name><value>14400</value><source>core-default.xml</source></property>
<property><name>ipc.client.ping</name><value>true</value><source>core-default.xml</source></property>
<property><name>ipc.client.kill.max</name><value>10</value><source>core-default.xml</source></property>
<property><name>ipc.client.connect.max.retries</name><value>10</value><source>core-default.xml</source></property>
<property><name>ipc.ping.interval</name><value>60000</value><source>core-default.xml</source></property>
<property><name>io.seqfile.local.dir</name><value>${hadoop.tmp.dir}/io/local</value><source>core-default.xml</source></property>
<property><name>hadoop.security.crypto.buffer.size</name><value>8192</value><source>core-default.xml</source></property>
<property><name>io.native.lib.available</name><value>true</value><source>core-default.xml</source></property>
<property><name>io.file.buffer.size</name><value>4096</value><source>core-default.xml</source></property>
<property><name>io.serializations</name><value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization</value><source>core-default.xml</source></property>
<property><name>tfile.fs.input.buffer.size</name><value>262144</value><source>core-default.xml</source></property>
<property><name>hadoop.security.group.mapping.ldap.ssl</name><value>false</value><source>core-default.xml</source></property>
<property><name>fs.df.interval</name><value>60000</value><source>core-default.xml</source></property>
<property><name>hadoop.http.authentication.kerberos.keytab</name><value>${user.home}/hadoop.keytab</value><source>core-default.xml</source></property>
<property><name>s3native.client-write-packet-size</name><value>65536</value><source>core-default.xml</source></property>
<property><name>s3native.replication</name><value>3</value><source>core-default.xml</source></property>
<property><name>tfile.io.chunk.size</name><value>1048576</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.hostname.verifier</name><value>DEFAULT</value><source>core-default.xml</source></property>



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to