HI Madhav Could you help to share some more information here. When u say few nodes are not utilized, is it always same nodes which are not utilized?
also how long each of these container are running on an average, pls make sure you have provided enough split size to ensure the containers are not short running. Thanks Sunil On Tue, Aug 9, 2016 at 4:49 AM Madhav Sharan <[email protected]> wrote: > Hi Hadoop users, > > I am running a m/r job with an input file of 23 million records. I can see > all our files are not getting used. > > What can I change to utilize all nodes? > > > Containers Mem Used Mem Avail Vcores used Vcores avail > 8 11.25 GB 0 B 8 0 > 0 0 B 11.25 GB 0 8 > 0 0 B 11.25 GB 0 8 > 8 11.25 GB 0 B 8 0 > 8 11.25 GB 0 B 8 0 > 7 11.25 GB 0 B 7 1 > 5 7.03 GB 4.22 GB 5 3 > 0 0 B 11.25 GB 0 8 > 0 0 B 11.25 GB 0 8 > > > My command looks like - > > hadoop jar > target/pooled-time-series-1.0-SNAPSHOT-jar-with-dependencies.jar > gov.nasa.jpl.memex.pooledtimeseries.MeanChiSquareDistanceCalculation > /user/pts/output/MeanChiSquareAndSimilarityInput > /user/pts/output/MeanChiSquaredCalcOutput > > Directory - */user/pts/output/MeanChiSquareAndSimilarityInput* have a > input file of 23 m records. File size is ~3 GB > > Code - > https://github.com/smadha/pooled_time_series/blob/master/src/main/java/gov/nasa/jpl/memex/pooledtimeseries/MeanChiSquareDistanceCalculation.java#L135 > > > -- > Madhav Sharan > >
