Without a reducer, does the sequence of output parts-XXXXX files correspond to the sequence of input records, or could some shuffling occur? If it does not match, how can I find the part corresponding to a given input record? I had some hopes for mapreduce_task_partition, but it seems the number of partitions does not necessarily match the number of requested mapping tasks (e.g., I-D mapreduce.job.maps=1271 gave me 1271 parts for 1271 input records, but only 1219 unique mapreduce_task_partition values ...)
Many thanks Rupert
