[Rd] Unable to execute Java MapReduce (Hadoop) code from R using rJava
Hi All, I have written a Java MapReduce code that runs on Hadoop. My intention is to create an R package which will call the Java code and execute the job. Hence, I have written a similar R function. But when I call this function from R terminal, the Hadoop job is not running. Its just printing few lines of warning messages and does nothing further. Here is the execution scenario: *> source("mueda.R")* *> mueda(analysis="eda", input="/user/root/sample1.txt", output="/user/root/eda_test", columns=c("0", "1"), columnSeparator=",")* *log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).* *log4j:WARN Please initialize the log4j system properly.* *log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.* *>* The warning messages displayed are common while running the Hadoop job normally (without R), but then it executes properly. What might be the cause that I am unable to execute the job from R using rJava and there's not error message as well? Below is the R code using rJava and calling the Java code written in the backend: *library(rJava) * *mueda <- function(analysis = "eda",* * input = NULL, * * output = NULL, * * columns = c(NULL), * * columnSeparator = NULL,* * cat.cut.off = 50,* * percentilePoints = c(0,0.01,0.05,0.1,0.25,0.50,0.75,0.90,0.95,0.99,1),* * histogram = FALSE) {* * * * if (is.null(input) || is.null(output) || is.null(columns) || is.null(columnSeparator)) {* * stop("Usage: mueda(, , , , , [], [], []")* * }* * * * # Gets the absolute path of the external JARS* * #pkgPath = paste(system.file(package="muEDA"), "/jars", sep="")* * pkgPath = paste("../inst", "/jars", sep="")* * * * # Initializes the JVM specifying the directory where the main Java class resides:* * .jinit("pkgPath")* * * * # Adds all the required JAR to the class path:* * .jaddClassPath(paste(pkgPath, "Eda.jar", sep="/"))* * .jaddClassPath(paste(pkgPath, "commons-cli-1.2.jar", sep="/"))* * .jaddClassPath(paste(pkgPath, "hadoop-hdfs-2.0.0-cdh4.3.0.jar", sep="/"))* * .jaddClassPath(paste(pkgPath, "slf4j-log4j12-1.6.1.jar", sep="/"))* * .jaddClassPath(paste(pkgPath, "commons-configuration-1.6.jar", sep="/"))* * .jaddClassPath(paste(pkgPath, "guava-11.0.2.jar", sep="/"))* * .jaddClassPath(paste(pkgPath, "hadoop-mapreduce-client-core-2.0.0-cdh4.3.0.jar", sep="/"))* * .jaddClassPath(paste(pkgPath, "commons-lang-2.5.jar", sep="/"))* * .jaddClassPath(paste(pkgPath, "hadoop-auth-2.0.0-cdh4.3.0.jar", sep="/"))* * .jaddClassPath(paste(pkgPath, "log4j-1.2.17.jar", sep="/"))* * .jaddClassPath(paste(pkgPath, "commons-logging-1.1.1.jar", sep="/"))* * .jaddClassPath(paste(pkgPath, "hadoop-common-2.0.0-cdh4.3.0.jar", sep="/"))* * .jaddClassPath(paste(pkgPath, "slf4j-api-1.6.1.jar", sep="/"))* * * * # Creates the R object for the main Java class:* * obj <- .jnew("EDA")* * * * if ((analysis == "eda") || (analysis == "univ")) {* * * * # Concatenating the column names to pass as an argument to Java* * col = columns[1]* * for(i in 2:length(columns)) {* * col = paste(col, columns[i], sep = ",")* * }* * * * switch (analysis,* * * * # Calls the Java main class with the return type, method name, parameters to pass to perform EDA* * eda = .jcall(obj, "V", "edaExecute", c("eda", input, output, col)),* * # Calls the Java main class with the return type, method name, parameters to pass to perform* * # Univariate Analysis* * univ = .jcall(obj, "V", "edaExecute", c("univ", input, output, col)))* * } else if (analysis == "freq") {* * * * # Calls the Java main class with the return type, method name, parameters to pass to perform* * # Frequency Analysis* * .jcall(obj, "V", "edaExecute", c("freq", input, output, col))* * } else if ((analysis != "eda") && (analysis != "univ") && (analysis != "freq")) {* * stop("Please provide either \"eda\" or \"univ\" or \"freq\" for argument")* * }* *}* Regards, Gaurav [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] How to write R data frame to HDFS using rhdfs?
Hello, I am trying to write the default "OrchardSprays" R data frame into HDFS using the "rhdfs" package. I want to write this data frame directly into HDFS without first storing it into any file in local file system. Which rhdfs command i should use? Can some one help me? I am very new to R and rhdfs. Regards, Gaurav [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] How to call Java main method using rJava package?
Hi, I am trying to integrate my Java program with R using rJava package. I want to execute the whole Java program from R. The main() method in my Java code calls all the other defined methods. So, I guess I have to call the main() method in .jcall. An example Java code: *class A { public static int mySum(int x, int y) { return x+y; } public static void main (String[] arg) { System.out.println("The sum is " + mySum(2, 4)); } }* I can do the following to call the mySum() method: *.jcall(obj, "I", "mySum", as.integer(2), as.integer(4))* This will give the output *6*. But can some one explain me that how exactly I can execute this program to print *The sum is 6* from R? OR how can I call the main method? I am a beginner in R. Thanks, Gaurav [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] ClassNotFoundException when running distributed job using rJava package
Hi, I have a MapReduce Java code, which I am calling from R using rJava. I have prepared the R package and tested that successfully. But when I deployed the package in a cluster and executed it, I am getting ClassNotFoundException. If I run the same job directly without integrating with R, it runs perfectly. Here is my R code: library(rJava) muMstSpark <- function(mesosMaster = NULL, input = NULL, output = NULL, scalaLib = NULL, sparkCore = NULL, inputSplits = 8) { if (is.null(mesosMaster) || is.null(input) || is.null(output) || is.null(scalaLib) || is.null(sparkCore)) { stop("Usage: muMST(, , , , , []") } # Gets the absolute path of the external Scala and Java JARS pkgPath = paste(system.file(package="MuMstBig"), "/jars", sep="") # Initializes the JVM specifying the directory where the main Java class resides: .jinit("pkgPath") # Adds all the required JARs to the class path: .jaddClassPath(paste(pkgPath, "Prims.jar", sep="/")) .jaddClassPath(paste(pkgPath, "MSTInSpark.jar", sep="/")) .jaddClassPath(scalaLib) .jaddClassPath(sparkCore) # Creates the R object for the main Java class: obj <- .jnew("MSTInSpark") # Calls the Java main class .jcall(obj, "V", "mst", c(mesosMaster, input, output, inputSplits)) } Here is the error log: 13/02/08 00:54:48 INFO cluster.TaskSetManager: Loss was due to java.lang.ClassNotFoundException: Prims$$anonfun$PrimsExecute$1 at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at spark.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:20) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1574) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1495) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1731) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350) at scala.collection.immutable.$colon$colon.readObject(List.scala:435) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350) at scala.collection.immutable.$colon$colon.readObject(List.scala:435) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328) at java.io.ObjectInputS