Hi, I am trying to process some data in hadoop. I wanted hadoop (MapReduce) to process the whole data as one split (one task) for testing purposes. My data size is 5368709120 bytes. But MR only processes 20% (equivalent to 8 tasks, 128MB each) of this size and considers this successful.
My data already divided in HDFS into 40 chunks (128 MB each), and I already set “ mapreduce.input.fileinputformat.split.minsize” and mapreduce.input.fileinputformat.split.maxsize” to 5368709120 bytes. Here is the output 19/04/02 18:29:44 INFO client.RMProxy: Connecting to ResourceManager at dmaster/10.40.0.0:8032 19/04/02 18:29:48 INFO input.FileInputFormat: Total input paths to process : 1 19/04/02 18:29:59 INFO mapreduce.JobSubmitter: number of splits:1 19/04/02 18:30:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1554217373879_0005 19/04/02 18:30:01 INFO impl.YarnClientImpl: Submitted application application_1554217373879_0005 19/04/02 18:30:01 INFO mapreduce.Job: The url to track the job: http://dmaster:8088/proxy/application_1554217373879_0005/ 19/04/02 18:30:01 INFO mapreduce.Job: Running job: job_1554217373879_0005 19/04/02 18:31:08 INFO mapreduce.Job: Job job_1554217373879_0005 running in uber mode : false 19/04/02 18:31:08 INFO mapreduce.Job: map 0% reduce 0% 19/04/02 18:32:20 INFO mapreduce.Job: map 1% reduce 0% 19/04/02 18:32:49 INFO mapreduce.Job: map 2% reduce 0% 19/04/02 18:33:20 INFO mapreduce.Job: map 3% reduce 0% 19/04/02 18:33:52 INFO mapreduce.Job: map 4% reduce 0% 19/04/02 18:34:22 INFO mapreduce.Job: map 5% reduce 0% 19/04/02 18:34:51 INFO mapreduce.Job: map 6% reduce 0% 19/04/02 18:35:21 INFO mapreduce.Job: map 7% reduce 0% 19/04/02 18:35:50 INFO mapreduce.Job: map 8% reduce 0% 19/04/02 18:36:20 INFO mapreduce.Job: map 9% reduce 0% 19/04/02 18:36:46 INFO mapreduce.Job: map 10% reduce 0% 19/04/02 18:37:16 INFO mapreduce.Job: map 11% reduce 0% 19/04/02 18:37:42 INFO mapreduce.Job: map 12% reduce 0% 19/04/02 18:38:11 INFO mapreduce.Job: map 13% reduce 0% 19/04/02 18:38:37 INFO mapreduce.Job: map 14% reduce 0% 19/04/02 18:39:08 INFO mapreduce.Job: map 15% reduce 0% 19/04/02 18:39:48 INFO mapreduce.Job: map 16% reduce 0% 19/04/02 18:41:13 INFO mapreduce.Job: map 17% reduce 0% 19/04/02 18:42:31 INFO mapreduce.Job: map 18% reduce 0% 19/04/02 18:43:57 INFO mapreduce.Job: map 19% reduce 0% 19/04/02 18:45:21 INFO mapreduce.Job: map 20% reduce 0% 19/04/02 18:46:01 INFO mapreduce.Job: map 100% reduce 0% 19/04/02 18:46:06 INFO mapreduce.Job: Job job_1554217373879_0005 completed successfully 19/04/02 18:46:07 INFO mapreduce.Job: Counters: 36 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=117516 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=1073741936 HDFS: Number of bytes written=1402482688 HDFS: Number of read operations=5 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Rack-local map tasks=1 Total time spent by all maps in occupied slots (ms)=882218 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=882218 Total vcore-milliseconds taken by all map tasks=882218 Total megabyte-milliseconds taken by all map tasks=903391232 Map-Reduce Framework Map input records=22945792 Map output records=189399040 Input split bytes=112 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=17859 CPU time spent (ms)=217250 Physical memory (bytes) snapshot=189526016 Virtual memory (bytes) snapshot=2008051712 Total committed heap usage (bytes)=152043520 File Input Format Counters Bytes Read=1073741824 File Output Format Counters Bytes Written=1402482688 Any help is appreciated.
