Hi,
The following snippet lets me iterate over each character of a file in HDFS
-
// Opening the file
Configuration conf = new Configuration();
FSDataInputStream in = null;
FileSystem fs = FileSystem.get(conf);
Path inFile = new Path(args[0]);
in = fs.open(inFile);
// Reading the file
Reader reader = new BufferedReader(new InputStreamReader(in,
Charset.forName(StandardCharsets.UTF_8.name())));
int c = 0;
while ((c = reader.read()) != -1) {
System.out.println((char)c);
}
But I imagine this is probably inefficient because of the BufferedReader.
I tried something like -
Configuration conf = new Configuration();
FSDataInputStream in = null;
FileSystem fs = FileSystem.get(conf);
Path inFile = new Path(args[0]);
in = fs.open(inFile);
ByteBuffer x = ByteBuffer.allocate(655360);
int length = in.read(x);
while (length > 0) {
int c = 0;
while (c < length) {
System.out.println(x.getChar(c));
c++;
}
x.clear();
length = in.read(x);
}
Although this is significantly faster, this does not seem to be printing
out the correct characters.
What is the best way to iterate over each character of a file stored in
HDFS?
Thanks,
--
Pratyush Das