We have an application which uses the DFSInputStream to read blocks from a
*remote* hadoop cluster. Is there any way we can influence which specific
datanode the block fetch request is dispatched to ?

The reasoning behind this is since our application workload is very heavy
on IO , we would like to distribute the IO load as evenly as possible
across the hosts/disks. Hence prior to reading data, we wish to obtain the
location of the underlying blocks and build a dispatch plan so as to
maximize the IO throughput on the HDFS cluster.

How do we go about this ?

-- 
Thanks
Shivram

Reply via email to