streaming in 2.10.0 : sequence of parts

Rupert Mazzucco Tue, 18 May 2021 01:51:51 -0700

Without a reducer, does the sequence of output parts-XXXXX
files correspond to the sequence of input records, or could some
shuffling occur? If it does not match, how can I find the part
corresponding to a given input record? I had some hopes for
mapreduce_task_partition, but it seems the number of partitions
does not necessarily match the number of requested mapping tasks
(e.g., I-D mapreduce.job.maps=1271 gave me 1271 parts for 1271
input records, but only 1219 unique mapreduce_task_partition values ...)


Many thanks
Rupert

streaming in 2.10.0 : sequence of parts

Reply via email to