I have written a MapReduce application running on 5 machines with Hadoop 1.0.2 installed. The App uses my custom FileInputFormat, RecordReader, TextOutputFormat, etc with Primary and secondary sort keys for reduce-side join. It works well when I set the number of reduce tasks to 1. When I use more than one reduce task, my output file comes out empty even though the status output shows I have Map Reduce output (Map output records=8268133). I did not use any Combiner. Does anyone know what I am doing wrong? I appreciate your help.
Here is the output status. 12/08/14 18:36:18 INFO mapred.JobClient: map 100% reduce 100% 12/08/14 18:36:23 INFO mapred.JobClient: Job complete: job_201208081134_0075 12/08/14 18:36:23 INFO mapred.JobClient: Counters: 28 12/08/14 18:36:23 INFO mapred.JobClient: Job Counters 12/08/14 18:36:23 INFO mapred.JobClient: Launched reduce tasks=2 12/08/14 18:36:23 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=241075 12/08/14 18:36:23 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/08/14 18:36:23 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/08/14 18:36:23 INFO mapred.JobClient: Launched map tasks=18 12/08/14 18:36:23 INFO mapred.JobClient: Data-local map tasks=18 12/08/14 18:36:23 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=166309 12/08/14 18:36:23 INFO mapred.JobClient: File Output Format Counters 12/08/14 18:36:23 INFO mapred.JobClient: Bytes Written=0 12/08/14 18:36:23 INFO mapred.JobClient: FileSystemCounters 12/08/14 18:36:23 INFO mapred.JobClient: FILE_BYTES_READ=2244635327 12/08/14 18:36:23 INFO mapred.JobClient: HDFS_BYTES_READ=996834876 12/08/14 18:36:23 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3300936988 12/08/14 18:36:23 INFO mapred.JobClient: File Input Format Counters 12/08/14 18:36:23 INFO mapred.JobClient: Bytes Read=996832672 12/08/14 18:36:23 INFO mapred.JobClient: Map-Reduce Framework 12/08/14 18:36:23 INFO mapred.JobClient: Map output materialized bytes=1055918357 12/08/14 18:36:23 INFO mapred.JobClient: Map input records=8268133 12/08/14 18:36:23 INFO mapred.JobClient: Reduce shuffle bytes=1055918357 12/08/14 18:36:23 INFO mapred.JobClient: Spilled Records=25895806 12/08/14 18:36:23 INFO mapred.JobClient: Map output bytes=1039334953 12/08/14 18:36:23 INFO mapred.JobClient: CPU time spent (ms)=357670 12/08/14 18:36:23 INFO mapred.JobClient: Total committed heap usage (bytes)=3258384384 12/08/14 18:36:23 INFO mapred.JobClient: Combine input records=0 12/08/14 18:36:23 INFO mapred.JobClient: SPLIT_RAW_BYTES=2204 12/08/14 18:36:23 INFO mapred.JobClient: Reduce input records=8268133 12/08/14 18:36:23 INFO mapred.JobClient: Reduce input groups=1294180 12/08/14 18:36:23 INFO mapred.JobClient: Combine output records=0 12/08/14 18:36:23 INFO mapred.JobClient: Physical memory (bytes) snapshot=3936837632 12/08/14 18:36:23 INFO mapred.JobClient: Reduce output records=0 12/08/14 18:36:23 INFO mapred.JobClient: Virtual memory (bytes) snapshot=10375229440 12/08/14 18:36:23 INFO mapred.JobClient: Map output records=8268133 -- View this message in context: http://old.nabble.com/No-output-when-more-than-one-reduce-task-tp34299347p34299347.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
