HI Madhav

Could you help to share some more information here. When u say few nodes
are not utilized, is it always same nodes which are not utilized?

also how long each of these container are running on an average, pls make
sure you have provided enough split size to ensure the containers are not
short running.

Thanks
Sunil

On Tue, Aug 9, 2016 at 4:49 AM Madhav Sharan <[email protected]> wrote:

> Hi Hadoop users,
>
> I am running a m/r job with an input file of 23 million records. I can see
> all our files are not getting used.
>
> What can I change to utilize all nodes?
>
>
> Containers Mem Used Mem Avail Vcores used Vcores avail
> 8 11.25 GB 0 B 8 0
> 0 0 B 11.25 GB 0 8
> 0 0 B 11.25 GB 0 8
> 8 11.25 GB 0 B 8 0
> 8 11.25 GB 0 B 8 0
> 7 11.25 GB 0 B 7 1
> 5 7.03 GB 4.22 GB 5 3
> 0 0 B 11.25 GB 0 8
> 0 0 B 11.25 GB 0 8
>
>
> My command looks like -
>
> hadoop jar
> target/pooled-time-series-1.0-SNAPSHOT-jar-with-dependencies.jar
> gov.nasa.jpl.memex.pooledtimeseries.MeanChiSquareDistanceCalculation 
> /user/pts/output/MeanChiSquareAndSimilarityInput
> /user/pts/output/MeanChiSquaredCalcOutput
>
> Directory - */user/pts/output/MeanChiSquareAndSimilarityInput* have a
> input file of 23 m records. File size is ~3 GB
>
> Code -
> https://github.com/smadha/pooled_time_series/blob/master/src/main/java/gov/nasa/jpl/memex/pooledtimeseries/MeanChiSquareDistanceCalculation.java#L135
>
>
> --
> Madhav Sharan
>
>

Reply via email to