Hello folks, I have a lot of input splits (10k-50k - 128 mb blocks) which contains text files. I need to process those line by line, then copy the result into roughly equal size of "shards".
So i generate a random key (from a range of [0:numberOfShards]) which is used to route the map output to different reducers and the size is more less equal. I know that this is not really efficient and i was wondering if i could somehow control how keys are routed. For example could i generate the randomKeys with hostname prefixes and control which keys are sent to each reducer? What do you think? Kind regards Mete
