Hi, Can you tell more about: * How big is N * How big is the input dataset * How many mappers you have * Do input splits correlate with the sorting criterion for top N?
Depending on the answers, very different strategies will be optimal. On Fri, Feb 1, 2013 at 9:05 PM, praveenesh kumar <[email protected]>wrote: > I am looking for a better solution for this. > > 1 way to do this would be to find top N values from each mappers and > then find out the top N out of them in 1 reducer. I am afraid that > this won't work effectively if my N is larger than number of values in > my inputsplit (or mapper input). > > Otherway is to just sort all of them in 1 reducer and then do the cat of > top-N. > > Wondering if there is any better approach to do this ? > > Regards > Praveenesh > -- Eugene Kirpichov http://www.linkedin.com/in/eugenekirpichov http://jkff.info/software/timeplotters - my performance visualization tools
