Hi everybody
I'm trying to run the wordcount example with a 1000 copies of a
gutenberg file downloaded and extracted from:
http://www.gutenberg.lib.md.us/1/0/0/0/10001/10001.zip, e.g. like:
$ cd /home/input
$ for i in `seq 1 999`; do cp 0.txt $i.txt; done
$ start-dfs.sh
$ hdfs namenode -format
$ hdfs dfs -mkdir /user
$ hdfs dfs -mkdir /user/hduser
$ hdfs dfs -put /home/input/ input
As you can see from http://hdmaster:50070/dfshealth.html#tab-datanode
the multiplied gutenburg file has been successfully uploaded 1000x times
to hdfs:
Node Last contact Admin State Capacity Used Non DFS
Used Remaining Blocks Block pool used Failed Volumes Version
hdmaster (10.10.10.10:50010) 1 In Service 238.32 GB 56.11 MB 48.73
GB 189.53 GB 1000 56.11 MB (0.02%) 0 2.6.4
When I run the wordcount example without yarn in a pseudo cluster, it
works (and fast):
$ hadoop jar
./hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar
wordcount input output
16/09/04 18:18:29 INFO mapred.LocalJobRunner: 1000 / 1000 copied.
16/09/04 18:18:29 INFO reduce.MergeManagerImpl: finalMerge called with
1000 in-memory map-outputs and 0 on-disk map-outputs
16/09/04 18:18:29 INFO mapred.Merger: Merging 1000 sorted segments
16/09/04 18:18:29 INFO mapred.Merger: Down to the last merge-pass, with
1000 segments left of total size: 40716000 bytes
16/09/04 18:18:31 INFO reduce.MergeManagerImpl: Merged 1000 segments,
40722000 bytes to disk to satisfy reduce memory limit
16/09/04 18:18:31 INFO reduce.MergeManagerImpl: Merging 1 files,
40720006 bytes from disk
16/09/04 18:18:31 INFO reduce.MergeManagerImpl: Merging 0 segments, 0
bytes from memory into reduce
16/09/04 18:18:31 INFO mapred.Merger: Merging 1 sorted segments
16/09/04 18:18:31 INFO mapred.Merger: Down to the last merge-pass, with
1 segments left of total size: 40719996 bytes
16/09/04 18:18:31 INFO mapred.LocalJobRunner: 1000 / 1000 copied.
16/09/04 18:18:31 INFO Configuration.deprecation: mapred.skip.on is
deprecated. Instead, use mapreduce.job.skiprecords
16/09/04 18:18:33 INFO mapred.Task:
Task:attempt_local1736492896_0001_r_000000_0 is done. And is in the
process of committing
16/09/04 18:18:33 INFO mapred.LocalJobRunner: 1000 / 1000 copied.
16/09/04 18:18:33 INFO mapred.Task: Task
attempt_local1736492896_0001_r_000000_0 is allowed to commit now
16/09/04 18:18:33 INFO output.FileOutputCommitter: Saved output of task
'attempt_local1736492896_0001_r_000000_0' to
hdfs://hdmaster:54310/user/hduser/output/_temporary/0/task_local1736492896_0001_r_000000
16/09/04 18:18:33 INFO mapred.LocalJobRunner: reduce > reduce
16/09/04 18:18:33 INFO mapred.Task: Task
'attempt_local1736492896_0001_r_000000_0' done.
16/09/04 18:18:33 INFO mapred.LocalJobRunner: Finishing task:
attempt_local1736492896_0001_r_000000_0
16/09/04 18:18:33 INFO mapred.LocalJobRunner: reduce task executor
complete.
16/09/04 18:18:33 INFO mapreduce.Job: map 100% reduce 100%
16/09/04 18:18:33 INFO mapreduce.Job: Job job_local1736492896_0001
completed successfully
16/09/04 18:18:33 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=2865334368
FILE: Number of bytes written=21137755248
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=26333765000
HDFS: Number of bytes written=37800
HDFS: Number of read operations=1007007
HDFS: Number of large read operations=0
HDFS: Number of write operations=1003
Map-Reduce Framework
Map input records=958000
Map output records=8807000
Map output bytes=85735000
Map output materialized bytes=40726000
Input split bytes=111890
Combine input records=8807000
Combine output records=3035000
Reduce input groups=3035
Reduce shuffle bytes=40726000
Reduce input records=3035000
Reduce output records=3035
Spilled Records=6070000
Shuffled Maps =1000
Failed Shuffles=0
Merged Map outputs=1000
GC time elapsed (ms)=6358
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=1586989891584
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=52510000
File Output Format Counters
Bytes Written=37800
90.20user 2.64system 1:01.70elapsed 150%CPU (0avgtext+0avgdata
1821840maxresident)k
0inputs+170688outputs (0major+61251minor)pagefaults 0swaps
When I add the minimal settings to mapred-site.xml to run the program in
a pseudo distributed yarn cluster:
mapred-site.xml:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>The framework for running mapreduce
jobs</description>
</property>
and yarn-site.xml:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
the wordcount program randomly exits and never finishes. I've tried
different hadoop versions (2.6.4, 2.7.2, 2.7.3) and an up to date Ubuntu
16.04 release and an up to date Gentoo Linux system. I've tried with
Oracle JDK 1.7 and 1.8 to no avail.
I cannot figure out how to debug the problem, I've now spent days trying
to investigate the problem.
$ hadoop jar
./hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar
wordcount input output
Below the output after about 190 reduce operations. As you can see the
whole cluster comes down (login shell gets terminated as well).
Sometimes the cluster is terminated after a handful of operations
already.
As you can see from the dstat output while executing the wordcount
example the memory is not a problem:
$ dstat -a -m
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
------memory-usage-----
usr sys idl wai hiq siq| read writ| recv send| in out | int csw |
used buff cach free
97 3 0 0 0 0| 0 1316k| 18k 776B| 0 0 |3284
10k|8396M 119M 4338M 18.7G
96 4 0 0 0 0| 0 0 | 241B 152B| 0 0 |3182
11k|8705M 119M 4339M 18.4G
96 4 0 0 0 0| 0 936k| 582B 316B| 0 0 |3288
15k|8251M 119M 4339M 18.9G
96 4 0 0 0 0| 52k 628k| 518B 316B| 0 0 |3141
10k|8326M 119M 4339M 18.8G
97 3 0 0 0 0| 0 32k| 652B 450B| 0 0 |3081 9674
|8490M 119M 4339M 18.7G
97 3 0 0 0 0| 52k 1372k| 393B 292B| 0 0 |3115
12k|8514M 119M 4340M 18.6G
96 4 0 0 0 0| 0 184k| 0 70B| 0 0 |3107
11k|8291M 119M 4340M 18.8G
97 3 0 0 0 0| 0 72k| 847B 474B| 0 0 |3156 8630
|8217M 119M 4344M 18.9G
97 3 0 0 0 0| 0 144k| 546B 298B| 0 0 |3176 7967
|8048M 119M 4345M 19.1G
97 3 0 0 0 0| 0 836k| 399B 298B| 0 0 |3228
11k|8231M 119M 4345M 18.9G
97 3 0 0 0 0| 0 1448k| 70B 70B| 0 0 |3096
16k|8469M 119M 4345M 18.7G
96 4 0 0 0 0| 0 88k| 259B 228B| 0 0 |3197
13k|8387M 119M 4346M 18.7G
97 3 0 0 0 0| 0 144k| 917B 544B| 0 0 |3044 7261
|8293M 119M 4346M 18.8G
96 4 0 0 0 0| 0 72k| 259B 158B| 0 0 |3231
12k|8342M 119M 4346M 18.8G
97 3 0 0 0 0| 0 828k| 518B 316B| 0 0 |3091 9285
|8828M 119M 4347M 18.3G
97 3 0 0 0 0| 0 2884k| 0 0 | 0 0 |3082
11k|8511M 119M 4347M 18.6G
96 4 0 0 0 0| 0 188k| 259B 158B| 0 0 |3105
24k|8303M 119M 4347M 18.8G
96 4 0 0 0 0| 0 184k| 518B 316B| 0 0 |3194 9256
|8161M 119M 4347M 19.0G
97 3 0 0 0 0| 52k 40k| 259B 158B| 0 0 |3014 8124
|8503M 119M 4348M 18.6G
97 3 0 0 0 0| 52k 812k| 0 0 | 0 0 |3217
23k|8531M 119M 4348M 18.6G
95 4 0 0 0 0| 0 1692k| 329B 428B| 0 0 |3345
18k|7954M 119M 4348M 19.2G
17 5 78 0 0 0| 0 616k| 288B 213B| 0 0 |2349
11k|3462M 119M 4347M 23.6G
0 0 99 0 0 0| 0 1012k| 0 0 | 0 0 | 290 640
|3462M 119M 4347M 23.6G
0 0 100 0 0 0| 0 0 | 0 0 | 0 0 | 340 979
|3462M 119M 4347M 23.6G
Here's the excerpt from yarn-hduser-nodemanager-hdmaster.log:
5642 2016-09-04 17:22:44,012 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Start request for container_1473002371122_0001_01_000194 by user hduser
5643 2016-09-04 17:22:44,012 INFO
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hduser
IP=10.10.10.10 OPERATION=Start Container Request
TARGET=ContainerManageImpl RESULT=SUCCESS
APPID=application_1473002371122_0001
CONTAINERID=container_1473002371122_0001_01_000194
5644 2016-09-04 17:22:44,015 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Adding container_1473002371122_0001_01_000194 to application
application_1473002371122_0001
5645 2016-09-04 17:22:44,016 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000194 transitioned from NEW
to LOCALIZING
5646 2016-09-04 17:22:44,016 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices:
Got event CONTAINER_INIT for appId application_1473002371122_0001
5647 2016-09-04 17:22:44,016 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices:
Got event APPLICATION_INIT for appId application_1473002371122_0001
5648 2016-09-04 17:22:44,016 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices:
Got APPLICATION_INIT for service mapreduce_shuffle
5649 2016-09-04 17:22:44,016 INFO
org.apache.hadoop.mapred.ShuffleHandler: Added token for
job_1473002371122_0001
5650 2016-09-04 17:22:44,016 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000194 transitioned from
LOCALIZING to LOCALIZED
5651 2016-09-04 17:22:44,032 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Start request for container_1473002371122_0001_01_000193 by user hduser
5652 2016-09-04 17:22:44,036 INFO
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hduser
IP=10.10.10.10 OPERATION=Start Container Request
TARGET=ContainerManageImpl RESULT=SUCCESS
APPID=application_1473002371122_0001
CONTAINERID=container_1473002371122_0001_01_000193
5653 2016-09-04 17:22:44,037 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Adding container_1473002371122_0001_01_000193 to application
application_1473002371122_0001
5654 2016-09-04 17:22:44,037 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000193 transitioned from NEW
to LOCALIZING
5655 2016-09-04 17:22:44,037 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices:
Got event CONTAINER_INIT for appId application_1473002371122_0001
5656 2016-09-04 17:22:44,037 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices:
Got event APPLICATION_INIT for appId application_1473002371122_0001
5657 2016-09-04 17:22:44,037 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices:
Got APPLICATION_INIT for service mapreduce_shuffle
5658 2016-09-04 17:22:44,037 INFO
org.apache.hadoop.mapred.ShuffleHandler: Added token for
job_1473002371122_0001
5659 2016-09-04 17:22:44,037 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000193 transitioned from
LOCALIZING to LOCALIZED
5660 2016-09-04 17:22:44,097 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000194 transitioned from
LOCALIZED to RUNNING
5661 2016-09-04 17:22:44,112 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
launchContainer: [bash,
/home/hduser/tmp/nm-local-dir/usercache/hduser/appcache/application_1473002371122_0001/container_1473002371122_0001_01_000194/default_container_executor.sh]
5662 2016-09-04 17:22:44,118 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 1354 for container-id
container_1473002371122_0001_01_000179: 167.5 MB of 1 GB physical memory
used; 683.8 MB of 6 GB virtual memory used
5663 2016-09-04 17:22:44,188 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000193 transitioned from
LOCALIZED to RUNNING
5664 2016-09-04 17:22:44,191 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
launchContainer: [bash,
/home/hduser/tmp/nm-local-dir/usercache/hduser/appcache/application_1473002371122_0001/container_1473002371122_0001_01_000193/default_container_executor.sh]
5665 2016-09-04 17:22:44,212 INFO
SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for
appattempt_1473002371122_0001_000001 (auth:SIMPLE)
5666 2016-09-04 17:22:44,213 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Stopping container with container Id:
container_1473002371122_0001_01_000174
5667 2016-09-04 17:22:44,213 INFO
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hduser
IP=10.10.10.10 OPERATION=Stop Container Request
TARGET=ContainerManageImpl RESULT=SUCCESS
APPID=application_1473002371122_0001
CONTAINERID=container_1473002371122_0001_01_000174
5668 2016-09-04 17:22:44,218 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000174 transitioned from
RUNNING to KILLING
5669 2016-09-04 17:22:44,218 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Cleaning up container container_1473002371122_0001_01_000174
5670 2016-09-04 17:22:44,245 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1473002371122_0001_01_000181 is : 143
5671 2016-09-04 17:22:44,246 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1473002371122_0001_01_000183 is : 143
5672 2016-09-04 17:22:44,246 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1473002371122_0001_01_000189 is : 143
5673 2016-09-04 17:22:44,246 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1473002371122_0001_01_000191 is : 143
5674 2016-09-04 17:22:44,247 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1473002371122_0001_01_000190 is : 143
5675 2016-09-04 17:22:44,247 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1473002371122_0001_01_000182 is : 143
5676 2016-09-04 17:22:44,248 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1473002371122_0001_01_000186 is : 143
5677 2016-09-04 17:22:44,252 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1473002371122_0001_01_000192 is : 143
5678 2016-09-04 17:22:44,255 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1473002371122_0001_01_000180 is : 143
5679 2016-09-04 17:22:44,255 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1473002371122_0001_01_000187 is : 143
5680 2016-09-04 17:22:44,270 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1473002371122_0001_01_000185 is : 143
5681 2016-09-04 17:22:44,281 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1473002371122_0001_01_000184 is : 143
5682 2016-09-04 17:22:44,281 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1473002371122_0001_01_000188 is : 143
5683 2016-09-04 17:22:44,281 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1473002371122_0001_01_000179 is : 143
5684 2016-09-04 17:22:44,282 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1473002371122_0001_01_000001 is : 143
5685 2016-09-04 17:22:44,282 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1473002371122_0001_01_000194 is : 143
5686 2016-09-04 17:22:44,296 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1473002371122_0001_01_000178 is : 143
5687 2016-09-04 17:22:44,296 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1473002371122_0001_01_000175 is : 143
5688 2016-09-04 17:22:44,307 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1473002371122_0001_01_000176 is : 143
5689 2016-09-04 17:22:44,308 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1473002371122_0001_01_000174 is : 143
5690 2016-09-04 17:22:44,316 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1473002371122_0001_01_000177 is : 143
5691 2016-09-04 17:22:44,323 ERROR
org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL
15: SIGTERM
5692 2016-09-04 17:22:44,348 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1473002371122_0001_01_000076 is : 143
5693 2016-09-04 17:22:44,383 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000181 transitioned from
RUNNING to EXITED_WITH_FAILURE
5694 2016-09-04 17:22:44,383 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000189 transitioned from
RUNNING to EXITED_WITH_FAILURE
5695 2016-09-04 17:22:44,383 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000183 transitioned from
RUNNING to EXITED_WITH_FAILURE
5696 2016-09-04 17:22:44,383 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000191 transitioned from
RUNNING to EXITED_WITH_FAILURE
5697 2016-09-04 17:22:44,383 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000190 transitioned from
RUNNING to EXITED_WITH_FAILURE
5698 2016-09-04 17:22:44,383 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000182 transitioned from
RUNNING to EXITED_WITH_FAILURE
5699 2016-09-04 17:22:44,384 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000186 transitioned from
RUNNING to EXITED_WITH_FAILURE
5700 2016-09-04 17:22:44,384 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000192 transitioned from
RUNNING to EXITED_WITH_FAILURE
5701 2016-09-04 17:22:44,384 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000180 transitioned from
RUNNING to EXITED_WITH_FAILURE
5702 2016-09-04 17:22:44,384 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000187 transitioned from
RUNNING to EXITED_WITH_FAILURE
5703 2016-09-04 17:22:44,384 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000185 transitioned from
RUNNING to EXITED_WITH_FAILURE
5704 2016-09-04 17:22:44,384 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000184 transitioned from
RUNNING to EXITED_WITH_FAILURE
5705 2016-09-04 17:22:44,384 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000188 transitioned from
RUNNING to EXITED_WITH_FAILURE
5706 2016-09-04 17:22:44,384 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000179 transitioned from
RUNNING to EXITED_WITH_FAILURE
5707 2016-09-04 17:22:44,384 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000001 transitioned from
RUNNING to EXITED_WITH_FAILURE
5708 2016-09-04 17:22:44,384 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000194 transitioned from
RUNNING to EXITED_WITH_FAILURE
5709 2016-09-04 17:22:44,384 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000178 transitioned from
RUNNING to EXITED_WITH_FAILURE
5710 2016-09-04 17:22:44,384 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000175 transitioned from
RUNNING to EXITED_WITH_FAILURE
5711 2016-09-04 17:22:44,384 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000176 transitioned from
RUNNING to EXITED_WITH_FAILURE
5712 2016-09-04 17:22:44,385 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000174 transitioned from
KILLING to CONTAINER_CLEANEDUP_AFTER_KILL
5713 2016-09-04 17:22:44,385 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000177 transitioned from
RUNNING to EXITED_WITH_FAILURE
5714 2016-09-04 17:22:44,385 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1473002371122_0001_01_000076 transitioned from
RUNNING to EXITED_WITH_FAILURE
5715 2016-09-04 17:22:44,385 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Cleaning up container container_1473002371122_0001_01_000181
5716 2016-09-04 17:22:44,389 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1473002371122_0001_01_000193 is : 143
5717 2016-09-04 17:22:44,394 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 1579 for container-id
container_1473002371122_0001_01_000183: 135.3 MB of 1 GB physical memory
used; 682.6 MB of 6 GB virtual memory used
5718 2016-09-04 17:22:44,399 ERROR
org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL
15: SIGTERM
5719 2016-09-04 17:22:44,407 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 1722 for container-id
container_1473002371122_0001_01_000186: 0B of 1 GB physical memory used;
0B of 6 GB virtual memory used
5720 2016-09-04 17:22:44,414 INFO org.mortbay.log: Stopped
[email protected]:8042
5721 2016-09-04 17:22:44,415 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Cleaning up container container_1473002371122_0001_01_000189
5722 2016-09-04 17:22:44,420 ERROR
org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL
15: SIGTERM
5723 2016-09-04 17:22:44,420 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 1265 for container-id
container_1473002371122_0001_01_000176: 0B of 1 GB physical memory used;
0B of 6 GB virtual memory used
Here's the output from running:
$ hadoop org.apache.hadoop.conf.Configuration
<?xml version="1.0" encoding="UTF-8" standalone="no"?><configuration>
<property><name>ha.failover-controller.cli-check.rpc-timeout.ms</name><value>20000</value><source>core-default.xml</source></property>
<property><name>ipc.client.connect.max.retries.on.timeouts</name><value>45</value><source>core-default.xml</source></property>
<property><name>hadoop.user.group.static.mapping.overrides</name><value>dr.who=;</value><source>core-default.xml</source></property>
<property><name>hadoop.tmp.dir</name><value>/home/hduser/tmp</value><source>core-site.xml</source></property>
<property><name>hadoop.security.java.secure.random.algorithm</name><value>SHA1PRNG</value><source>core-default.xml</source></property>
<property><name>nfs.exports.allowed.hosts</name><value>*
rw</value><source>core-default.xml</source></property>
<property><name>ha.health-monitor.check-interval.ms</name><value>1000</value><source>core-default.xml</source></property>
<property><name>ipc.client.idlethreshold</name><value>4000</value><source>core-default.xml</source></property>
<property><name>fs.trash.checkpoint.interval</name><value>0</value><source>core-default.xml</source></property>
<property><name>io.skip.checksum.errors</name><value>false</value><source>core-default.xml</source></property>
<property><name>hadoop.security.groups.negative-cache.secs</name><value>30</value><source>core-default.xml</source></property>
<property><name>fs.har.impl.disable.cache</name><value>true</value><source>core-default.xml</source></property>
<property><name>fs.defaultFS</name><value>hdfs://hdmaster:54310</value><source>core-site.xml</source></property>
<property><name>fs.client.resolve.remote.symlinks</name><value>true</value><source>core-default.xml</source></property>
<property><name>hadoop.rpc.socket.factory.class.default</name><value>org.apache.hadoop.net.StandardSocketFactory</value><source>core-default.xml</source></property>
<property><name>hadoop.security.kms.client.authentication.retry-count</name><value>1</value><source>core-default.xml</source></property>
<property><name>io.mapfile.bloom.size</name><value>1048576</value><source>core-default.xml</source></property>
<property><name>hadoop.rpc.protection</name><value>authentication</value><source>core-default.xml</source></property>
<property><name>net.topology.impl</name><value>org.apache.hadoop.net.NetworkTopology</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.require.client.cert</name><value>false</value><source>core-default.xml</source></property>
<property><name>io.bytes.per.checksum</name><value>512</value><source>core-default.xml</source></property>
<property><name>file.stream-buffer-size</name><value>4096</value><source>core-default.xml</source></property>
<property><name>ha.failover-controller.new-active.rpc-timeout.ms</name><value>60000</value><source>core-default.xml</source></property>
<property><name>ha.zookeeper.acl</name><value>world:anyone:rwcda</value><source>core-default.xml</source></property>
<property><name>fs.ftp.host.port</name><value>21</value><source>core-default.xml</source></property>
<property><name>hadoop.security.group.mapping</name><value>org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.keystores.factory.class</name><value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value><source>core-default.xml</source></property>
<property><name>s3.replication</name><value>3</value><source>core-default.xml</source></property>
<property><name>net.topology.node.switch.mapping.impl</name><value>org.apache.hadoop.net.ScriptBasedMapping</value><source>core-default.xml</source></property>
<property><name>fs.s3.buffer.dir</name><value>${hadoop.tmp.dir}/s3</value><source>core-default.xml</source></property>
<property><name>s3native.bytes-per-checksum</name><value>512</value><source>core-default.xml</source></property>
<property><name>fs.s3a.multipart.purge</name><value>false</value><source>core-default.xml</source></property>
<property><name>s3.client-write-packet-size</name><value>65536</value><source>core-default.xml</source></property>
<property><name>io.mapfile.bloom.error.rate</name><value>0.005</value><source>core-default.xml</source></property>
<property><name>ftp.bytes-per-checksum</name><value>512</value><source>core-default.xml</source></property>
<property><name>hadoop.security.group.mapping.ldap.search.attr.group.name</name><value>cn</value><source>core-default.xml</source></property>
<property><name>ha.health-monitor.rpc-timeout.ms</name><value>45000</value><source>core-default.xml</source></property>
<property><name>hadoop.security.authorization</name><value>false</value><source>core-default.xml</source></property>
<property><name>s3.bytes-per-checksum</name><value>512</value><source>core-default.xml</source></property>
<property><name>fs.s3n.multipart.uploads.block.size</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>ipc.client.fallback-to-simple-auth-allowed</name><value>false</value><source>core-default.xml</source></property>
<property><name>ipc.server.listen.queue.size</name><value>128</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.enabled.protocols</name><value>TLSv1</value><source>core-default.xml</source></property>
<property><name>hadoop.security.kms.client.encrypted.key.cache.low-watermark</name><value>0.3f</value><source>core-default.xml</source></property>
<property><name>s3native.blocksize</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>file.replication</name><value>1</value><source>core-default.xml</source></property>
<property><name>ftp.client-write-packet-size</name><value>65536</value><source>core-default.xml</source></property>
<property><name>hadoop.work.around.non.threadsafe.getpwuid</name><value>false</value><source>core-default.xml</source></property>
<property><name>fs.du.interval</name><value>600000</value><source>core-default.xml</source></property>
<property><name>hadoop.http.authentication.type</name><value>simple</value><source>core-default.xml</source></property>
<property><name>hadoop.http.staticuser.user</name><value>dr.who</value><source>core-default.xml</source></property>
<property><name>hadoop.util.hash.type</name><value>murmur</value><source>core-default.xml</source></property>
<property><name>hadoop.security.instrumentation.requires.admin</name><value>false</value><source>core-default.xml</source></property>
<property><name>hadoop.security.kms.client.encrypted.key.cache.size</name><value>500</value><source>core-default.xml</source></property>
<property><name>fs.s3a.connection.maximum</name><value>15</value><source>core-default.xml</source></property>
<property><name>fs.s3a.attempts.maximum</name><value>10</value><source>core-default.xml</source></property>
<property><name>io.map.index.interval</name><value>128</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.client.conf</name><value>ssl-client.xml</value><source>core-default.xml</source></property>
<property><name>hadoop.security.kms.client.encrypted.key.cache.expiry</name><value>43200000</value><source>core-default.xml</source></property>
<property><name>hadoop.kerberos.kinit.command</name><value>kinit</value><source>core-default.xml</source></property>
<property><name>fs.AbstractFileSystem.hdfs.impl</name><value>org.apache.hadoop.fs.Hdfs</value><source>core-default.xml</source></property>
<property><name>io.map.index.skip</name><value>0</value><source>core-default.xml</source></property>
<property><name>hadoop.http.authentication.token.validity</name><value>36000</value><source>core-default.xml</source></property>
<property><name>hadoop.jetty.logs.serve.aliases</name><value>true</value><source>core-default.xml</source></property>
<property><name>ftp.replication</name><value>3</value><source>core-default.xml</source></property>
<property><name>io.compression.codec.bzip2.library</name><value>system-native</value><source>core-default.xml</source></property>
<property><name>ha.failover-controller.graceful-fence.connection.retries</name><value>1</value><source>core-default.xml</source></property>
<property><name>fs.swift.impl</name><value>org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem</value><source>core-default.xml</source></property>
<property><name>ha.health-monitor.sleep-after-disconnect.ms</name><value>1000</value><source>core-default.xml</source></property>
<property><name>fs.s3a.connection.timeout</name><value>5000</value><source>core-default.xml</source></property>
<property><name>ipc.client.rpc-timeout.ms</name><value>0</value><source>core-default.xml</source></property>
<property><name>file.client-write-packet-size</name><value>65536</value><source>core-default.xml</source></property>
<property><name>fs.AbstractFileSystem.viewfs.impl</name><value>org.apache.hadoop.fs.viewfs.ViewFs</value><source>core-default.xml</source></property>
<property><name>hadoop.security.group.mapping.ldap.search.filter.group</name><value>(objectClass=group)</value><source>core-default.xml</source></property>
<property><name>hadoop.security.crypto.codec.classes.aes.ctr.nopadding</name><value>org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec,org.apache.hadoop.crypto.JceAesCtrCryptoCodec</value><source>core-default.xml</source></property>
<property><name>fs.s3n.block.size</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>hadoop.security.crypto.cipher.suite</name><value>AES/CTR/NoPadding</value><source>core-default.xml</source></property>
<property><name>net.topology.script.number.args</name><value>100</value><source>core-default.xml</source></property>
<property><name>dfs.ha.fencing.ssh.connect-timeout</name><value>30000</value><source>core-default.xml</source></property>
<property><name>hadoop.security.authentication</name><value>simple</value><source>core-default.xml</source></property>
<property><name>tfile.fs.output.buffer.size</name><value>262144</value><source>core-default.xml</source></property>
<property><name>hadoop.security.groups.cache.secs</name><value>300</value><source>core-default.xml</source></property>
<property><name>ha.failover-controller.graceful-fence.rpc-timeout.ms</name><value>5000</value><source>core-default.xml</source></property>
<property><name>fs.AbstractFileSystem.file.impl</name><value>org.apache.hadoop.fs.local.LocalFs</value><source>core-default.xml</source></property>
<property><name>fs.s3a.impl</name><value>org.apache.hadoop.fs.s3a.S3AFileSystem</value><source>core-default.xml</source></property>
<property><name>ha.health-monitor.connect-retry-interval.ms</name><value>1000</value><source>core-default.xml</source></property>
<property><name>fs.s3a.multipart.threshold</name><value>2147483647</value><source>core-default.xml</source></property>
<property><name>fs.s3.maxRetries</name><value>4</value><source>core-default.xml</source></property>
<property><name>fs.s3n.multipart.uploads.enabled</name><value>false</value><source>core-default.xml</source></property>
<property><name>hadoop.security.group.mapping.ldap.directory.search.timeout</name><value>10000</value><source>core-default.xml</source></property>
<property><name>file.blocksize</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>fs.ftp.host</name><value>0.0.0.0</value><source>core-default.xml</source></property>
<property><name>file.bytes-per-checksum</name><value>512</value><source>core-default.xml</source></property>
<property><name>ha.zookeeper.parent-znode</name><value>/hadoop-ha</value><source>core-default.xml</source></property>
<property><name>fs.s3a.multipart.size</name><value>104857600</value><source>core-default.xml</source></property>
<property><name>fs.s3a.multipart.purge.age</name><value>86400</value><source>core-default.xml</source></property>
<property><name>fs.s3n.multipart.copy.block.size</name><value>5368709120</value><source>core-default.xml</source></property>
<property><name>fs.trash.interval</name><value>0</value><source>core-default.xml</source></property>
<property><name>fs.s3.sleepTimeSeconds</name><value>10</value><source>core-default.xml</source></property>
<property><name>rpc.metrics.quantile.enable</name><value>false</value><source>core-default.xml</source></property>
<property><name>ftp.stream-buffer-size</name><value>4096</value><source>core-default.xml</source></property>
<property><name>hadoop.http.authentication.signature.secret.file</name><value>${user.home}/hadoop-http-auth-signature-secret</value><source>core-default.xml</source></property>
<property><name>io.seqfile.sorter.recordlimit</name><value>1000000</value><source>core-default.xml</source></property>
<property><name>s3.blocksize</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>fs.permissions.umask-mode</name><value>022</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.server.conf</name><value>ssl-server.xml</value><source>core-default.xml</source></property>
<property><name>fs.s3a.connection.ssl.enabled</name><value>true</value><source>core-default.xml</source></property>
<property><name>fs.s3a.buffer.dir</name><value>${hadoop.tmp.dir}/s3a</value><source>core-default.xml</source></property>
<property><name>s3native.stream-buffer-size</name><value>4096</value><source>core-default.xml</source></property>
<property><name>hadoop.security.groups.cache.warn.after.ms</name><value>5000</value><source>core-default.xml</source></property>
<property><name>hadoop.http.authentication.kerberos.principal</name><value>HTTP/_HOST@LOCALHOST</value><source>core-default.xml</source></property>
<property><name>hadoop.security.kms.client.encrypted.key.cache.num.refill.threads</name><value>2</value><source>core-default.xml</source></property>
<property><name>hadoop.security.group.mapping.ldap.search.filter.user</name><value>(&(objectClass=user)(sAMAccountName={0}))</value><source>core-default.xml</source></property>
<property><name>fs.automatic.close</name><value>true</value><source>core-default.xml</source></property>
<property><name>ipc.client.connect.retry.interval</name><value>1000</value><source>core-default.xml</source></property>
<property><name>fs.s3a.paging.maximum</name><value>5000</value><source>core-default.xml</source></property>
<property><name>s3.stream-buffer-size</name><value>4096</value><source>core-default.xml</source></property>
<property><name>ha.zookeeper.session-timeout.ms</name><value>5000</value><source>core-default.xml</source></property>
<property><name>fs.AbstractFileSystem.har.impl</name><value>org.apache.hadoop.fs.HarFs</value><source>core-default.xml</source></property>
<property><name>io.seqfile.compress.blocksize</name><value>1000000</value><source>core-default.xml</source></property>
<property><name>hadoop.http.filter.initializers</name><value>org.apache.hadoop.http.lib.StaticUserWebFilter</value><source>core-default.xml</source></property>
<property><name>fs.s3.block.size</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>hadoop.http.authentication.simple.anonymous.allowed</name><value>true</value><source>core-default.xml</source></property>
<property><name>ftp.blocksize</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>io.seqfile.lazydecompress</name><value>true</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.enabled</name><value>false</value><source>core-default.xml</source></property>
<property><name>hadoop.common.configuration.version</name><value>0.23.0</value><source>core-default.xml</source></property>
<property><name>hadoop.security.group.mapping.ldap.search.attr.member</name><value>member</value><source>core-default.xml</source></property>
<property><name>hadoop.security.random.device.file.path</name><value>/dev/urandom</value><source>core-default.xml</source></property>
<property><name>ipc.client.connection.maxidletime</name><value>10000</value><source>core-default.xml</source></property>
<property><name>ipc.client.connect.timeout</name><value>20000</value><source>core-default.xml</source></property>
<property><name>hadoop.security.uid.cache.secs</name><value>14400</value><source>core-default.xml</source></property>
<property><name>ipc.client.ping</name><value>true</value><source>core-default.xml</source></property>
<property><name>ipc.client.kill.max</name><value>10</value><source>core-default.xml</source></property>
<property><name>ipc.client.connect.max.retries</name><value>10</value><source>core-default.xml</source></property>
<property><name>ipc.ping.interval</name><value>60000</value><source>core-default.xml</source></property>
<property><name>io.seqfile.local.dir</name><value>${hadoop.tmp.dir}/io/local</value><source>core-default.xml</source></property>
<property><name>hadoop.security.crypto.buffer.size</name><value>8192</value><source>core-default.xml</source></property>
<property><name>io.native.lib.available</name><value>true</value><source>core-default.xml</source></property>
<property><name>io.file.buffer.size</name><value>4096</value><source>core-default.xml</source></property>
<property><name>io.serializations</name><value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization</value><source>core-default.xml</source></property>
<property><name>tfile.fs.input.buffer.size</name><value>262144</value><source>core-default.xml</source></property>
<property><name>hadoop.security.group.mapping.ldap.ssl</name><value>false</value><source>core-default.xml</source></property>
<property><name>fs.df.interval</name><value>60000</value><source>core-default.xml</source></property>
<property><name>hadoop.http.authentication.kerberos.keytab</name><value>${user.home}/hadoop.keytab</value><source>core-default.xml</source></property>
<property><name>s3native.client-write-packet-size</name><value>65536</value><source>core-default.xml</source></property>
<property><name>s3native.replication</name><value>3</value><source>core-default.xml</source></property>
<property><name>tfile.io.chunk.size</name><value>1048576</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.hostname.verifier</name><value>DEFAULT</value><source>core-default.xml</source></property>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]