[Rd] Unable to execute Java MapReduce (Hadoop) code from R using rJava

2013-09-23 Thread Gaurav Dasgupta
Hi All,

I have written a Java MapReduce code that runs on Hadoop. My intention is
to create an R package which will call the Java code and execute the job.
Hence, I have written a similar R function. But when I call this function
from R terminal, the Hadoop job is not running. Its just printing few lines
of warning messages and does nothing further. Here is the execution
scenario:

*> source("mueda.R")*
*> mueda(analysis="eda", input="/user/root/sample1.txt",
output="/user/root/eda_test", columns=c("0", "1"), columnSeparator=",")*
*log4j:WARN No appenders could be found for logger
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).*
*log4j:WARN Please initialize the log4j system properly.*
*log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.*
*>*

The warning messages displayed are common while running the Hadoop job
normally (without R), but then it executes properly. What might be the
cause that I am unable to execute the job from R using rJava and there's
not error message as well?

Below is the R code using rJava and calling the Java code written in the
backend:

*library(rJava)
*
*mueda <- function(analysis = "eda",*
*  input = NULL, *
*  output = NULL, *
*  columns = c(NULL), *
*  columnSeparator = NULL,*
*  cat.cut.off = 50,*
*  percentilePoints = c(0,0.01,0.05,0.1,0.25,0.50,0.75,0.90,0.95,0.99,1),*
*  histogram = FALSE) {*
*
*
* if (is.null(input) || is.null(output) || is.null(columns) ||
is.null(columnSeparator)) {*
* stop("Usage: mueda(, , , ,
, [], [], []")*
* }*
*
*
* # Gets the absolute path of the external JARS*
* #pkgPath = paste(system.file(package="muEDA"), "/jars", sep="")*
* pkgPath = paste("../inst", "/jars", sep="")*
*
*
* # Initializes the JVM specifying the directory where the main Java class
resides:*
* .jinit("pkgPath")*
*  *
* # Adds all the required JAR to the class path:*
* .jaddClassPath(paste(pkgPath, "Eda.jar", sep="/"))*
* .jaddClassPath(paste(pkgPath, "commons-cli-1.2.jar", sep="/"))*
* .jaddClassPath(paste(pkgPath, "hadoop-hdfs-2.0.0-cdh4.3.0.jar", sep="/"))*
* .jaddClassPath(paste(pkgPath, "slf4j-log4j12-1.6.1.jar", sep="/"))*
* .jaddClassPath(paste(pkgPath, "commons-configuration-1.6.jar", sep="/"))*
* .jaddClassPath(paste(pkgPath, "guava-11.0.2.jar", sep="/"))*
* .jaddClassPath(paste(pkgPath,
"hadoop-mapreduce-client-core-2.0.0-cdh4.3.0.jar", sep="/"))*
* .jaddClassPath(paste(pkgPath, "commons-lang-2.5.jar", sep="/"))*
* .jaddClassPath(paste(pkgPath, "hadoop-auth-2.0.0-cdh4.3.0.jar", sep="/"))*
* .jaddClassPath(paste(pkgPath, "log4j-1.2.17.jar", sep="/"))*
* .jaddClassPath(paste(pkgPath, "commons-logging-1.1.1.jar", sep="/"))*
* .jaddClassPath(paste(pkgPath, "hadoop-common-2.0.0-cdh4.3.0.jar",
sep="/"))*
* .jaddClassPath(paste(pkgPath, "slf4j-api-1.6.1.jar", sep="/"))*
*
*
* # Creates the R object for the main Java class:*
* obj <- .jnew("EDA")*
*
*
* if ((analysis == "eda") || (analysis == "univ")) {*
*
*
* # Concatenating the column names to pass as an argument to Java*
* col = columns[1]*
* for(i in 2:length(columns)) {*
* col = paste(col, columns[i], sep = ",")*
* }*
*
*
* switch (analysis,*
*
*
* # Calls the Java main class with the “return type”, “method name”,
“parameters to pass” to perform EDA*
* eda = .jcall(obj, "V", "edaExecute", c("eda", input, output, col)),*
* # Calls the Java main class with the “return type”, “method name”,
“parameters to pass” to perform*
* # Univariate Analysis*
* univ = .jcall(obj, "V", "edaExecute", c("univ", input, output, col)))*
* } else if (analysis == "freq") {*
*
*
* # Calls the Java main class with the “return type”, “method name”,
“parameters to pass” to perform*
* # Frequency Analysis*
* .jcall(obj, "V", "edaExecute", c("freq", input, output, col))*
* } else if ((analysis != "eda") && (analysis != "univ") && (analysis !=
"freq")) {*
* stop("Please provide either \"eda\" or \"univ\" or \"freq\" for
 argument")*
* }*
*}*

Regards,
Gaurav

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] How to write R data frame to HDFS using rhdfs?

2013-10-09 Thread Gaurav Dasgupta
Hello,

I am trying to write the default "OrchardSprays" R data frame into HDFS
using the "rhdfs" package. I want to write this data frame directly into
HDFS without first storing it into any file in local file system.

Which rhdfs command i should use? Can some one help me? I am very new to R
and rhdfs.

Regards,
Gaurav

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] How to call Java main method using rJava package?

2013-01-16 Thread Gaurav Dasgupta
Hi,

I am trying to integrate my Java program with R using rJava package. I want
to execute the whole Java program from R. The main() method in my Java code
calls all the other defined methods. So, I guess I have to call the main()
method in .jcall.

An example Java code:

*class A {
public static int mySum(int x, int y) {
return x+y;
}
public static void main (String[] arg) {
System.out.println("The sum is " + mySum(2, 4));
}
}*

I can do the following to call the mySum() method:
*.jcall(obj, "I", "mySum", as.integer(2), as.integer(4))*
This will give the output *6*.

But can some one explain me that how exactly I can execute this program to
print *The sum is 6* from R? OR how can I call the main method?
I am a beginner in R.

Thanks,
Gaurav

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] ClassNotFoundException when running distributed job using rJava package

2013-02-07 Thread Gaurav Dasgupta
Hi,

I have a MapReduce Java code, which I am calling from R using rJava. I have
prepared the R package and tested that successfully. But when I deployed
the package in a cluster and executed it, I am getting
ClassNotFoundException. If I run the same job directly without integrating
with R, it runs perfectly.
Here is my R code:

library(rJava)
muMstSpark <- function(mesosMaster = NULL, input = NULL, output = NULL,
scalaLib = NULL, sparkCore = NULL, inputSplits = 8) {
  if (is.null(mesosMaster) || is.null(input) || is.null(output) ||
is.null(scalaLib) || is.null(sparkCore)) {
stop("Usage: muMST(, , , ,
, []")
  }

  # Gets the absolute path of the external Scala and Java JARS
  pkgPath = paste(system.file(package="MuMstBig"), "/jars", sep="")

  # Initializes the JVM specifying the directory where the main Java class
resides:
  .jinit("pkgPath")

  # Adds all the required JARs to the class path:
  .jaddClassPath(paste(pkgPath, "Prims.jar", sep="/"))
  .jaddClassPath(paste(pkgPath, "MSTInSpark.jar", sep="/"))
  .jaddClassPath(scalaLib)
  .jaddClassPath(sparkCore)

  # Creates the R object for the main Java class:
  obj <- .jnew("MSTInSpark")

  # Calls the Java main class
  .jcall(obj, "V", "mst", c(mesosMaster, input, output, inputSplits))
}
Here is the error log:

13/02/08 00:54:48 INFO cluster.TaskSetManager: Loss was due to
java.lang.ClassNotFoundException: Prims$$anonfun$PrimsExecute$1
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:247)
 at
spark.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:20)
 at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1574)
 at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1495)
 at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1731)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
 at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
 at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
 at scala.collection.immutable.$colon$colon.readObject(List.scala:435)
 at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)
 at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
 at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
 at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
 at scala.collection.immutable.$colon$colon.readObject(List.scala:435)
 at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)
 at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
 at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
 at java.io.ObjectInputS