Hi,
I am using cloudera's hadoop version - Hadoop 0.20.2-cdh3u3 and trying to
use the MultipleInputs incorporating separate mapper class in the following
manner-
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(IntegrateExisting.class);
conf.setJobName("IntegrateExisting");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
Path existingKeysInputPath = new Path(args[0]);
Path newKeysInputPath = new Path(args[1]);
Path outputPath = new Path(args[2]);
MultipleInputs.addInputPath(conf, existingKeysInputPath,
TextInputFormat.class, MapExisting.class);
MultipleInputs.addInputPath(conf, newKeysInputPath,
TextInputFormat.class, MapNew.class);
conf.setCombinerClass(ReduceAndFilterOut.class);
conf.setReducerClass(ReduceAndFilterOut.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileOutputFormat.setOutputPath(conf, outputPath);
//FileInputFormat.addInputPath(conf,existingKeysInputPath);
//FileInputFormat.addInputPath(conf,newKeysInputPath);
JobClient.runJob(conf);
}
Without the commented lines in the above code, the MR job fails with the
following error-
12/07/05 16:59:25 ERROR security.UserGroupInformation:
PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.IOException:
No input paths specified in job
Exception in thread "main" java.io.IOException: No input paths specified in
job
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:153
)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:205)
at
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:971)
at
org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:963)
at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1157)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1242)
at org.myorg.IntegrateExisting.main(IntegrateExisting.java:122)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Uncommenting the lines, leads to the following error in the mappers-
java.lang.ClassCastException: org.apache.hadoop.mapred.FileSplit cannot be
cast to org.apache.hadoop.mapred.lib.TaggedInputSplit
at
org.apache.hadoop.mapred.lib.DelegatingMapper.map(DelegatingMapper.java:48)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1157)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
I see the MAPREDUCE-1178 that discusses the second error is included in the
CDH3 version. Is there any code missing from the above piece?
Thanks for the help.
Regards,
Sanchita