Without a reducer, does the sequence of output parts-XXXXX
files correspond to the sequence of input records, or could some
shuffling occur? If it does not match, how can I find the part
corresponding to a given input record? I had some hopes for
mapreduce_task_partition, but it seems the number of partitions
does not necessarily match the number of requested mapping tasks
(e.g., I-D mapreduce.job.maps=1271 gave me 1271 parts for 1271
input records, but only 1219 unique mapreduce_task_partition values ...)

Many thanks
Rupert

Reply via email to