Hi all,
I am having issues using SequenceFileInputFormat to retrieve whole records
I have 1 job that is used to write to a SequenceFile
SequenceFileOutputFormat.setOutputPath(job, new Path("out/data"));
SequenceFileOutputFormat.setOutputCompressionType(job,
SequenceFile.CompressionType.NONE);
I then have a second job that is ment to read the file for processing
SequenceFileInputFormat.addInputPath(job, new Path("out/data"));
However, the values that i get as the arguments to the Map part of my job
only seems to contain parts of the record. I am sure that i am missing
something rather fundamental as to how Hadoop splits inputs to the Mapper,
but can't seem to find a way to stop the records being split.
Any help (or a pointer to a specific page in the doc) would be greatly
appreciated
Regards,
Tim