We have an application which uses the DFSInputStream to read blocks from a *remote* hadoop cluster. Is there any way we can influence which specific datanode the block fetch request is dispatched to ?
The reasoning behind this is since our application workload is very heavy on IO , we would like to distribute the IO load as evenly as possible across the hosts/disks. Hence prior to reading data, we wish to obtain the location of the underlying blocks and build a dispatch plan so as to maximize the IO throughput on the HDFS cluster. How do we go about this ? -- Thanks Shivram
