Hello,

I'm having problem running MR jobs remotely. The reason to do this is to be able to run integration tests. I have a jar which I usually run on YARN cluster using

yarn jar ... (or hadoop jar ...)

The thing is, now I want to write integration tests and to do that I created a separate (maven) project in which I include jar containing my MR jobs. The only difference is that now I'm running the job with Configuration created in the following way:

Configuration conf = new Configuration();
conf.set("mapred.job.tracker", "192.168.x.x:8050");
conf.set("fs.default.name", "192.168.x.x");
conf.set("mapreduce.framework.name", "yarn");

Additionally, to be able to run the job as a different user I use UGI in the following way:

UserGroupInformation ugi = UserGroupInformation.createRemoteUser("username");

ugi.doAs(PrivilegedAction<Void>) () -> {

    myJobDriver.runJob(conf);

}

At first I did not set framework name and the strange thing was that I get output created on HDFS, but there is no log (I'm looking at the web console) by ResourceManager that my job was run on the cluster. That made me think that my job is actually run locally, but the data was retrieved from cluster and saved to it.

Now when I aded the framework name I get "java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses." exception.

What seems to be the problem and how to fix it?

Additionally, could anyone comment is this a good way to perform integration testing on Hadoop?

Many thanks,
Marko

Reply via email to