This question is pertaining to hadoop with core version 1.2.1 and hbase
1.2.3.
I wrote a simple map/reduce job that looks like this:
- The input to the mapper is whole HDFS files at a time, via a custom
InputFormat
- The output of the mapper is <LongWriteable, Text>
The job is configured like this:
Configuration conf = getConf();
JobConf cfg = new JobConf(conf, FindMissionTimeJob.class);
/* Set up mapper */
Path inputPath = new Path(args[0]);
WholeFileInputFormat.setInputPaths(cfg, inputPath);
cfg.setNumMapTasks(1);
cfg.setMapperClass(TimeMapper.class);
cfg.setInputFormat(WholeFileInputFormat.class);
cfg.setMapOutputKeyClass(LongWritable.class);
cfg.setMapOutputValueClass(Text.class);
cfg.setNumReduceTasks(1);
TableMapReduceUtil.initTableReduceJob(tableName, TimeReducer.class,
cfg);
When I run it, I get an NPE here:
16/10/07 15:33:55 INFO mapreduce.Job: Task Id :
attempt_1475608557171_0021_r_000000_2, Status : FAILED
Error: java.lang.NullPointerException
at org.apache.hadoop.mapred.Task.getFsStatistics(Task.java:373)
at
org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:478)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:414)
Tracing the source of that, it really comes down to this bit of code in
ReduceTask.java that is failing:
if (job.getOutputFormat() instanceof FileOutputFormat) {
matchedStats = getFsStatistics(FileOutputFormat.getOutputPath(job),
job);
}
getFsStatistics() throws an NPE becaue there is no output path. The
problem is, we shoud not even get there, because initTableReduceJob()
already set the output format to this::
job.setOutputFormat(TableOutputFormat.class);
Looking at the source for that guy:
public class TableOutputFormat extends
FileOutputFormat<ImmutableBytesWritable, Put>
Ahh hah. There's the problem. That extends FileOutputFormat, which causes
everything to blow up because it's a table reducer. It's not really a File
output, but getFsStatistics is invoked and subsequently blows up. It seems
likely that getFsStatistics should check for a NULL here, and perhaps that
TableOutputFormat probably should not extend FileOutputFormat - I'm not
sure.
The way I got around this was to create my own OutputFormat called
MyTableOutputFormat that was essentially a copy of the real
TableOutputFormat but instead of extending FIleOutputFormat I did this:
public class MyTableOutputFormat implements
OutputFormat<ImmutableBytesWritable, Put> { ... }
I don't know if that's a correct or incorrect solution... just that it runs
an empty hbase-outputting reducer without blowing up on an NPE.
I would appreciate any comments/feedback on the problem as well as my
workaround, and whether or not anybody else has encountered this.
--Chris