Hi all,
I'm quite new to Hadopp and only worked with a single node setup so far. I
wrote a local driver that submits Jobs to my cluster. I instantiate a single
Configuration instance right at the start of my process, and pass it around
like that:
public static void main(String[] args) {
int exitCode = ERR_INVALID_ARGS.get();
Configuration conf = new Configuration(true);
try {
if (- == args[0].charAt(0)) {
exitCode = runSelectedTool(conf, args);
} else if (args.length >= 2 && args.length <= 3) {
exitCode = crunchFullDataset(conf, args);
}
} catch (Exception e){
e.printStackTrace();
exitCode = ERR_FATAL.get();
}
System.exit(exitCode);
}
private static int runSelectedTool(Configuration conf, String[] args) throws
Exception {
int exitCode;
String toolSwitch = args[0];
args = Arrays.copyOfRange(args,1,args.length);
if (SWITCH_FORMATTER_COUNTER.equals(toolSwitch)) {
exitCode = ToolRunner.run(conf, new FormatterCounter(), args);
} else if (SWITCH_CANDIDATES_FILTER.equals(toolSwitch)) {
exitCode = ToolRunner.run(conf, new CandidatesFilter(), args);
}
}
Prior to this, I was instantiating a new conf object each time I called
ToolRunner.run(), but now I use conf.set() & get() to pass values between jobs.
Is it a bad idea (and why), or this the right way to proceed?
Many thanks,
Pierre