Re: [Rd] How to efficiently share data (a dataframe) between R and Java

2015-12-15 Thread Ing . Jaroslav Kuchař
Dear all,

thank you for your hints. I would prefer to do not use Rserve as Dirk
mentioned. 

@Simon
I have full control over the Java implementation - I can adapt the code
that I use for the communication R <-> Java. 

> You can natively access structures on each side. The fastest way is to
> use R representation (column-oriented) in Java - that is much faster
> than any kind of serialization or anything you mention above since you
> pass the variables as a whole.

Could you please send any reference to more examples or documentation
that can help me? 
The main goal is to copy a full dataframe from R to Java.

Best regards,
Jaroslav

On 2015-12-07 03:19, Simon Urbanek wrote:
> On Dec 6, 2015, at 12:36 PM, Ing. Jaroslav Kuchař
>  wrote:
> 
>> Dear all,
>>
>> in our ongoing project we use Java implementations of several
>> algorithms. We also provide a “wrapper” implemented as an R package
>> using rJava (https://github.com/jaroslav-kuchar/rCBA). Based on our
>> recent experiments, the significant portion of time is spent on copying
>> a dataframe from R to Java. The Java implementation needs access to the
>> source dataframe.
>>
>> I have tested several approaches: calling Java method row-by-row;
>> serialize the whole data-frame to a temp file and parsing in Java; or
>> row binding to a single vector and calling a single Java method. Each
>> approach has its limitations e.g. time-consuming row-by-row copying,
>> serialization and parsing performance or memory limitations of a single
>> vector.
>>
>> Is there an efficient approach how to copy a dataframe from R to Java
>> and another one from Java to R?
>>
>> Thanks for any help you can provide...
>>
> 
> You can natively access structures on each side. The fastest way is to
> use R representation (column-oriented) in Java - that is much faster
> than any kind of serialization or anything you mention above since you
> pass the variables as a whole.
> 
> Typically, the bottleneck are Java applications which may require very
> inefficient data structures. If you have control over the algorithms,
> you can simply use proper data structures and avoid that problem. If
> you don't have control, you'll have to add Java code that converts to
> whatever structure is needed by the Java code form the data frame
> pushed to the Java side. The main point here is that you do NOT want
> to do any conversion on the R side.
> 
> Cheers,
> Šimon

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] How to efficiently share data (a dataframe) between R and Java

2015-12-15 Thread Simon Urbanek
You can pass the entire df, example:

> data(iris)
> iris$sp = as.character(iris$Species)
> o=.jarray(lapply(iris, .jarray))
> .jcall("C",,"df",o)
df, 6 variables
[0]: double[150]
[1]: double[150]
[2]: double[150]
[3]: double[150]
[4]: int[150]
[5]: String[150]


Java code:

public class C {
   static void df(Object df[]) {
  int n;
  System.out.println("df, " + (n = df.length) + " variables");
  int i = 0;
  while (i < n) {
if (df[i] instanceof double[]) {
double d[] = (double[]) df[i];
System.out.println("["+i+"]: double["+d.length+"]");
} else if (df[i] instanceof int[]) {
int d[] = (int[]) df[i];
System.out.println("["+i+"]: int["+d.length+"]");
} else if (df[i] instanceof String[]) {
String s[] = (String[]) df[i];
System.out.println("["+i+"]: String["+s.length+"]");
} else {
System.out.println("["+i+"]: some other type...");
}
i++;
  }
}
}

Normally, you wouldn't pass the entire df but instead have methods for the 
types you care about as the modeling function - that's more Java-like approach, 
but either is valid and there is no difference in efficiency.

Cheers,
Simon



> On Dec 15, 2015, at 12:50 PM, Ing. Jaroslav Kuchař 
>  wrote:
> 
> Dear all,
> 
> thank you for your hints. I would prefer to do not use Rserve as Dirk
> mentioned. 
> 
> @Simon
> I have full control over the Java implementation - I can adapt the code
> that I use for the communication R <-> Java. 
> 
>> You can natively access structures on each side. The fastest way is to
>> use R representation (column-oriented) in Java - that is much faster
>> than any kind of serialization or anything you mention above since you
>> pass the variables as a whole.
> 
> Could you please send any reference to more examples or documentation
> that can help me? 
> The main goal is to copy a full dataframe from R to Java.
> 
> Best regards,
> Jaroslav
> 
> On 2015-12-07 03:19, Simon Urbanek wrote:
>> On Dec 6, 2015, at 12:36 PM, Ing. Jaroslav Kuchař
>>  wrote:
>> 
>>> Dear all,
>>> 
>>> in our ongoing project we use Java implementations of several
>>> algorithms. We also provide a “wrapper” implemented as an R package
>>> using rJava (https://github.com/jaroslav-kuchar/rCBA). Based on our
>>> recent experiments, the significant portion of time is spent on copying
>>> a dataframe from R to Java. The Java implementation needs access to the
>>> source dataframe.
>>> 
>>> I have tested several approaches: calling Java method row-by-row;
>>> serialize the whole data-frame to a temp file and parsing in Java; or
>>> row binding to a single vector and calling a single Java method. Each
>>> approach has its limitations e.g. time-consuming row-by-row copying,
>>> serialization and parsing performance or memory limitations of a single
>>> vector.
>>> 
>>> Is there an efficient approach how to copy a dataframe from R to Java
>>> and another one from Java to R?
>>> 
>>> Thanks for any help you can provide...
>>> 
>> 
>> You can natively access structures on each side. The fastest way is to
>> use R representation (column-oriented) in Java - that is much faster
>> than any kind of serialization or anything you mention above since you
>> pass the variables as a whole.
>> 
>> Typically, the bottleneck are Java applications which may require very
>> inefficient data structures. If you have control over the algorithms,
>> you can simply use proper data structures and avoid that problem. If
>> you don't have control, you'll have to add Java code that converts to
>> whatever structure is needed by the Java code form the data frame
>> pushed to the Java side. The main point here is that you do NOT want
>> to do any conversion on the R side.
>> 
>> Cheers,
>> Šimon
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel