Hi Denny,
Thanks a lot. I was able to make my code work.
I am posting a small example below - in case somebody in the future has
similar need ;-)
(not handling replica datablocks).
David.
***************************************************************************
public static void main(String args[]){
String filename="/user/hive/warehouse/sample_07/sample_07.csv";
int DATANODE_PORT = 50010;
int NAMENODE_PORT = 8020;
String HOST_IP = "192.168.1.230";
byte[] buf=new byte[1000];
try{
ClientProtocol client= DFSClient.createNamenode(new
InetSocketAddress(HOST_IP,NAMENODE_PORT), new Configuration());
LocatedBlocks located= client.getBlockLocations(filename, 0,
Long.MAX_VALUE);
for(LocatedBlock block : located.getLocatedBlocks()){
Socket sock = SocketFactory.getDefault().createSocket();
InetSocketAddress targetAddr = new
InetSocketAddress(HOST_IP,DATANODE_PORT);
NetUtils.connect(sock, targetAddr, 10000);
sock.setSoTimeout(10000);
BlockReader reader=BlockReader.newBlockReader(sock,
filename,
block.getBlock().getBlockId(),
block.getBlockToken(),
block.getBlock().getGenerationStamp(), 0,
block.getBlockSize(),
1000);
int count=0;
int length;
while((length=reader.read(buf,0,1000))>0){
//System.out.print(new
String(buf,0,length,"UTF-8"));
if (length<1000) break;
}
reader.close();
sock.close();
}
}catch(IOException ex){
ex.printStackTrace();
}
}
***************************************************************************
From: Denny Ye <[email protected]>
Reply-To: <[email protected]>
Date: Mon, 9 Jan 2012 16:29:18 +0800
To: <[email protected]>
Subject: Re: How-to use DFSClient's BlockReader from Java
hi David Please refer to the method "DFSInputStream#blockSeekTo", it
has same purpose with you.
***************************************************************************
LocatedBlock targetBlock = getBlockAt(target, true);
assert (target==this.pos) : "Wrong postion " + pos + " expect " +
target;
long offsetIntoBlock = target - targetBlock.getStartOffset();
DNAddrPair retval = chooseDataNode(targetBlock);
chosenNode = retval.info <http://retval.info>;
InetSocketAddress targetAddr = retval.addr;
try {
s = socketFactory.createSocket();
NetUtils.connect(s, targetAddr, socketTimeout);
s.setSoTimeout(socketTimeout);
Block blk = targetBlock.getBlock();
Token<BlockTokenIdentifier> accessToken =
targetBlock.getBlockToken();
blockReader = BlockReader.newBlockReader(s, src,
blk.getBlockId(),
accessToken,
blk.getGenerationStamp(),
offsetIntoBlock, blk.getNumBytes() - offsetIntoBlock,
buffersize, verifyChecksum, clientName);
***************************************************************************
-Regards
Denny Ye
2012/1/6 David Pavlis <[email protected]>
Hi,
I am relatively new to Hadoop and I am trying to utilize HDFS for own
application where I want to take advantage of data partitioning HDFS
performs.
The idea is that I get list of individual blocks - BlockLocations of
particular file and then directly read those (go to individual DataNodes).
So far I found org.apache.hadoop.hdfs.DFSClient.BlockReader to be the way
to go.
However I am struggling with instantiating the BlockReader() class, namely
creating the "Token<BlockTokenIdentifier>".
Is there an example Java code showing how to access individual blocks of
particular file stored on HDFS ?
Thanks in advance,
David.