[Rd] libRblas.so: undefined reference to `xerbla_' ?

2011-01-22 Thread Andre Zege
Hi all, i am trying to compile a test, calling from C code  R Lapack  shared 
libraries. In particular, i am calling simple LAPACK driver 

dposv for solving linear equation  system A*x=B  with positive definite A. My 
code looks like the following in 


solve.c 
== 
#include 
#include  
#include  


int main(){ 
  double A[4]={1,0.5,0.5,1}; 
  double B[2]={3,4}; 
  char uplo='U'; 
  int n = 2, nrhs=1, lda=2, ldb=2, info, i; 
  F77_CALL(dposv)(&uplo,&n, &nrhs, A, &lda, B, &ldb, &info); 
  for(i=0; i<2; i++){ 
printf("%f\n", B[i]); 
  } 
  return info; 

} 
== 
When I am trying to link to BLAS/LAPACK using 

gcc -std=gnu99 solve.c -o test -I$R_HOME/include -L$R_HOME/lib -lRblas 
-lRlapack 
-lgfortran 


linker generates an error message 
$RHOME/lib/libRblas.so: undefined reference to `xerbla_' 

Dumping symbol table shows that indeed libRblas.so has undefined  xerbla_ 
symbol 
and so does libRlapack. Confusingly, documentation says  that xerbla is error 
checking routine for BLAS, but it is not found in  the library libRblas. 


I did find out that xerbla is defined in libR.so and when i link  to R library, 
everything seems to go fine. However, i have a nagging  feeling i am doing 
something wrong. It doesn't make sense to me that i  cannot compile code that 
doesn't use R without linking to R. Also, one  would want to switch 
transparently between different implementations of  BLAS for example for 
testing 
purposes and not modify linking  instructions. Would appreciate if someone with 
better understanding of R  commented on how to properly link to  BLAS and 
LAPACK 
libraries  included with R.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Newbie Rccp module question. "Failed to initialize module pointer"???

2011-02-17 Thread Andre Zege
Hi all. I started looking at Rcpp, which looks pretty great, actually. At the 
moment just trying to compile a module to get a feel how it all works without 
fully understanding how all the pieces fit together. 


Basically, i took the first example from Rcpp modules vignette: 

fun.cpp 
 
#include  
#include  

using namespace Rcpp; 

double norm(double x, double y){ 
  return sqrt(x*x+y*y); 
} 

RCPP_MODULE(mod){ 
  function("norm", &norm); 
} 
== 

I then run Rcpp.package.skeleton("mypackage"), put fun.cpp in mypackage/src and 
did 

R CMD INSTALL mypackage, which seemed to compile mypackage.so OK. However, when 
i am trying  to use module, i get error message.  Namely, after i run R and do 

>library("Rcpp") 
>library("mypackage") 
> mod<-Module("mod") 
>mod$norm(3,4) 

i get the following 

Error in Module(module, mustStart = TRUE) : 
  Failed to initialize module pointer: Error in 
FUN("_rcpp_module_boot_mod"[[1L]], ...): no such symbol _rcpp_module_boot_mod 
in 
package .GlobalEnv 



I am pretty sure my error is a pretty obvious one, could someone give me a 
pointer on what to do differently or where to look for reference. Literal 
search 
for the error message doesn't bring anything useful.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] can one modify array in R memory from C++ without copying it?

2011-11-01 Thread andre zege
Hi, guys. I posted this by accident at rcpp-dev, although it meant to
be only to r-dev, so don't flame me here please, rcpp guys will
do it there, i am sure :).
I have some pretty large arrays in R and i wanted to do some time
consuming modifications of these arrays in C++ without actually copying
them, just by passing pointers to them. Since i don't know internal data
structures of R, i am not sure it's possible, but i thought it was. Here is
some toy code that i thought should work, but doesn't. Maybe someone could
point out the error i am making

i have the following in the passptr.cpp to multiply array elements by 2
===
extern "C"{
 void modify(double *mem, int *nr, int *nc){
  for(int i=0; i< (*nr)*(*nc); i++)
mem[i]=2*mem[i];
   }
}

--
I compile it into a shared library using
R CMD SHLIB passptr.cpp
load and run from R as follows



>dyn.load("/home/az05625/testarma/passptr.so")

>m<-matrix(1:10,nr=2)

>.C("modify", as.double(m), as.integer(2), as.integer(5), DUP=FALSE)

>From reading docs i thought that DUP=FALSE would ensure that R matrix is
not copied and is multiplied by 2 in place. However, it's not the case,
matrix m is the same after calling .C("modify"...)

as it was before. Am i calling incorrectly, or is it just impossible to
modify R matrix in place from C++? Would greatly appreciate any pointers.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] unexpectedly high memory use in R 2.14.0

2012-04-11 Thread andre zege
I recently started using R 2.14.0 on a new machine and i am  experiencing
what seems like unusually greedy memory use. It happens all the time, but
to give a specific example, let's say i run the following code



for(j in 1:length(files)){
  load(file.path(dump.dir, files[j]))
  mat.data[[j]]<-data
}
save(abind(mat.data, along=2), file.path(dump.dir, filename))

-

It loads parts of multidimensional matrix into a list, then binds it along
second dimension and saves on disk. Code works, although slowly, but what's
strange is the amount of memory it uses.
In particular, each chunk of data is between 50M to 100M, and altogether
the binded matrix is 1.3G. One would expect that R would use roughly double
that memory - to keep mat.data and its binded version separately, or 1G. I
could imagine that for somehow it could use 3 times the size of matrix. But
in fact it uses more than 5.5 times (almost all of my physical memory) and
i think is swapping a lot to disk . For this particular task, my top output
shows eating more than 7G of memory and using up 11G of virtual memory as
well

$top

PIDUSER  PR  NI  VIRTRES  SHR   S %CPU %MEMTIME+  COMMAND
8823  user25   0  11g 7.2g  10m   R   99.7 92.9
5:55.05
R

8590   root   15   0  154m   16m   5948  S  0.5  0.2
23:22.40 Xorg


I have strong suspicion that something is off with my R binary, i don't
think i experienced things like that in a long time. Is this in line with
what i am supposed to experience? Are there any ideas for diagnosing what
is going on?
Would appreciate any suggestions

Thanks
Andre


==

Here is what i am running on:


CentOS release 5.5 (Final)


> sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices datasets  utils methods   base

other attached packages:
[1] abind_1.4-0   rJava_0.9-3   R.utils_1.12.1R.oo_1.9.3
R.methodsS3_1.2.2

loaded via a namespace (and not attached):
[1] codetools_0.2-8 tcltk_2.14.0tools_2.14.0



I compiled R configure as follows
/configure --prefix=/usr/local/R --enable-byte-compiled-packages=no
--with-tcltk --enable-R-shlib=yes

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] unexpectedly high memory use in R 2.14.0

2012-04-11 Thread andre zege
You are quite right that my exec time would seriously go down if i
pre-allocate and not even use abind, just assign into the preallocated
matrix. The reason i didn't do it here is that this is a part of some
utility function that doesn't know the sizes of chunks that are on disk
untill it reads all of them. If i knew a way to read dimnames off disk
without reading whole matrices, i could do what you are suggesting. I guess
i am better off using filebacked matrices from bigmemory, where i could
read dimnames off disk without reading the matrix. I need to unwrap 4 dim
arrays into 2 dim arrays and wrap them back, but i guess it would be faster
anyway.

My question, however was not so much about speed improvement of a
particular task. It was whether this memory use of 7.2g physical memory and
11g of virtual makes sense when i am building a 1.3G matrix with this code.
It just seems to me that my memory goes to almost 100% physical not just on
this task but on others. I wonder if there is something seriously off with
my memory experience and if i should rebuild R.

In term of your lapply solution, it indeed used much less memory, in fact
about 25% less memory than the loop, about 4 times the size of the final
object. I am still not clear if my memory use makes sense in terms of R
memory model and I am frankly not clear why lapply uses less memory. (I
understand why it makes less copying)

On Wed, Apr 11, 2012 at 7:15 PM, peter dalgaard  wrote:

>
> On Apr 12, 2012, at 00:53 , andre zege wrote:
>
> > I recently started using R 2.14.0 on a new machine and i am  experiencing
> > what seems like unusually greedy memory use. It happens all the time, but
> > to give a specific example, let's say i run the following code
> >
> > 
> >
> > for(j in 1:length(files)){
> >  load(file.path(dump.dir, files[j]))
> >  mat.data[[j]]<-data
> > }
> > save(abind(mat.data, along=2), file.path(dump.dir, filename))
>
> Hmm, did you preallocate mat.data? If not, you will be copying it
> repeatedly, and I'm not sure that this can be done by copying pointers only.
>
> Does it work better with
>
> mat.data <- lapply(files, function(name) {load(file.path(dump.dir, name);
> data})
>
> ?
>
>
> >
> > -
> >
> > It loads parts of multidimensional matrix into a list, then binds it
> along
> > second dimension and saves on disk. Code works, although slowly, but
> what's
> > strange is the amount of memory it uses.
> > In particular, each chunk of data is between 50M to 100M, and altogether
> > the binded matrix is 1.3G. One would expect that R would use roughly
> double
> > that memory - to keep mat.data and its binded version separately, or 1G.
> I
> > could imagine that for somehow it could use 3 times the size of matrix.
> But
> > in fact it uses more than 5.5 times (almost all of my physical memory)
> and
> > i think is swapping a lot to disk . For this particular task, my top
> output
> > shows eating more than 7G of memory and using up 11G of virtual memory as
> > well
> >
> > $top
> >
> > PIDUSER  PR  NI  VIRTRES  SHR   S %CPU %MEMTIME+  COMMAND
> > 8823  user25   0  11g 7.2g  10m   R   99.7 92.9
> > 5:55.05
> > R
> >
> > 8590   root   15   0  154m   16m   5948  S  0.5  0.2
> > 23:22.40 Xorg
> >
> >
> > I have strong suspicion that something is off with my R binary, i don't
> > think i experienced things like that in a long time. Is this in line with
> > what i am supposed to experience? Are there any ideas for diagnosing what
> > is going on?
> > Would appreciate any suggestions
> >
> > Thanks
> > Andre
> >
> >
> > ==
> >
> > Here is what i am running on:
> >
> >
> > CentOS release 5.5 (Final)
> >
> >
> >> sessionInfo()
> > R version 2.14.0 (2011-10-31)
> > Platform: x86_64-unknown-linux-gnu (64-bit)
> >
> > locale:
> > [1] en_US.UTF-8
> >
> > attached base packages:
> > [1] stats graphics  grDevices datasets  utils methods   base
> >
> > other attached packages:
> > [1] abind_1.4-0   rJava_0.9-3   R.utils_1.12.1R.oo_1.9.3
> > R.methodsS3_1.2.2
> >
> > loaded via a namespace (and not attached):
> > [1] codetools_0.2-8 tcltk_2.14.0tools_2.14.0
> >
> >
> >
> > I compiled R configure as follows
> > /configure --prefix=/usr/local/R --enable-byte-compiled-packages=no
> > --with-tcltk --enable-R-shlib=yes
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd@cbs.dk  Priv: pda...@gmail.com
>
>
>
>
>
>
>
>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] unexpectedly high memory use in R 2.14.0

2012-04-12 Thread andre zege
Henrik, thanks for your reply. I might have misrepresented a  bit my actual
code . It seems that you are suggesting doing rm() on objects i don't use.
In the real code which behavior i
reported it is exactly what is being done, i.e i use rm().  I also use a
small wrapper around load that lets me assign loaded data directly into any
a variable with any name, without remembering the name of the object from
which it was saved, i.e instead of standard load i use something like (with
error checking in real code)

ut.load(filename)<-function(filename){ load(filename); s<-ls(); get("obj")
}

in other words, after i called data[[j]]<-ut.load(file[j]), there is no
reference to intermediary object to clean, i am assuming garbage collector
quickly takes care of it.
Just making sure that we are on the same page. I am mostly looking for some
guidance on what to expect in terms of R memory behavior. This particular
task is just an illustration of a typical issue that i encounter often
lately. Is there a way to diagnose if everything is normal with a
particular task in terms of memory use? Is there a memory benchmark? Is
there some white paper discussing  how memory and copying of objects
actually works in R? Is there a limited chunk of C code that i could read
to try to understand it? I just don't want to read all of the C code.

Thanks much
Andre



On Wed, Apr 11, 2012 at 9:02 PM, Henrik Bengtsson wrote:

> Leaving aside what's going on inside abind::abind(), maybe the
> following sheds some light on what's is being wasted:
>
> # Preallocate (probably doesn't make a difference because it's a list)
> mat.data <- vector("list", length=length(files));
> for (j in 1:length(files)){
> vars <- load(file.path(dump.dir, files[j]))
> mat.data[[j]]<-data;
>  # Not needed anymore/remove everything loaded
> rm(list=vars);
> }
>
> data <- abind(mat.data, along=2);
> # Not needed anymore
> rm(mat.data);
>
> save(data, file.path(dump.dir, filename))
>
> My $.02
> /Henrik
>
> On Wed, Apr 11, 2012 at 3:53 PM, andre zege  wrote:
> > I recently started using R 2.14.0 on a new machine and i am  experiencing
> > what seems like unusually greedy memory use. It happens all the time, but
> > to give a specific example, let's say i run the following code
> >
> > 
> >
> > for(j in 1:length(files)){
> >  load(file.path(dump.dir, files[j]))
> >  mat.data[[j]]<-data
> > }
> > save(abind(mat.data, along=2), file.path(dump.dir, filename))
> >
> > -
> >
> > It loads parts of multidimensional matrix into a list, then binds it
> along
> > second dimension and saves on disk. Code works, although slowly, but
> what's
> > strange is the amount of memory it uses.
> > In particular, each chunk of data is between 50M to 100M, and altogether
> > the binded matrix is 1.3G. One would expect that R would use roughly
> double
> > that memory - to keep mat.data and its binded version separately, or 1G.
> I
> > could imagine that for somehow it could use 3 times the size of matrix.
> But
> > in fact it uses more than 5.5 times (almost all of my physical memory)
> and
> > i think is swapping a lot to disk . For this particular task, my top
> output
> > shows eating more than 7G of memory and using up 11G of virtual memory as
> > well
> >
> > $top
> >
> > PIDUSER  PR  NI  VIRTRES  SHR   S %CPU %MEMTIME+  COMMAND
> > 8823  user25   0  11g 7.2g  10m   R   99.7 92.9
> > 5:55.05
> > R
> >
> > 8590   root   15   0  154m   16m   5948  S  0.5  0.2
> > 23:22.40 Xorg
> >
> >
> > I have strong suspicion that something is off with my R binary, i don't
> > think i experienced things like that in a long time. Is this in line with
> > what i am supposed to experience? Are there any ideas for diagnosing what
> > is going on?
> > Would appreciate any suggestions
> >
> > Thanks
> > Andre
> >
> >
> > ==
> >
> > Here is what i am running on:
> >
> >
> > CentOS release 5.5 (Final)
> >
> >
> >> sessionInfo()
> > R version 2.14.0 (2011-10-31)
> > Platform: x86_64-unknown-linux-gnu (64-bit)
> >
> > locale:
> > [1] en_US.UTF-8
> >
> > attached base packages:
> > [1] stats graphics  grDevices datasets  utils methods   base
> >
> > other attached packages:
> > [1] abind_1.4-0   rJava_0.9-3   R.utils_1.12.1R.oo_1.9.3
> > R.methodsS3_1.2.2
> >
> > loaded via a namespace 

[Rd] R-2.15 compile error: fatal error: internal consistency failure

2012-04-17 Thread andre zege
I am unable to compile R-2.15.0 source. I configured it without problems
with options that i used many times before

./configure --prefix=/home/andre/R-2.15.0
--enable-byte-compiled-packages=no --with-tcltk --enable-R-shlib=yes

Then when i started making it, it died while making lapack, particularly on
the line

gfortran  -fopenmp -fpic  -g -O2  -c dlapack3.f -o dlapack3.o
dlapack3.f: In function ‘dsbgst’:
dlapack3.f:12097: fatal error: internal consistency failure
compilation terminated.
make[4]: *** [dlapack3.o] Error 1

Could anyone give me a clue what is going wrong and how could i fix that? I
am running Centos 5.5, in particular, the following

$ more /proc/version
Linux version 2.6.18-194.el5 (mockbu...@builder10.centos.org) (gcc version
4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Fri Apr 2 14:58:14 EDT 2010

Thanks
Andre

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] looking for adice on bigmemory framework with C++ and java interoperability

2012-05-04 Thread andre zege
I work with problems that have rather large data requirements -- typically
a bunch of multigig arrays. Given how generous R is with using memory, the
only way for me to work with R has been to use bigmatrices from bigmemory
package. One thing that is missing a bit is interoperability of bigmatrices
with C++ and possibly java. What i mean by that is API that would allow
read and write filebacked matrices from C++, and ideally java without being
called from R. Having ability to save armadillo matrices into filebacked
matrices and load them back into armadillo would be another very useful
thing. This would allow really smooth cooperation between various pieces of
software. I would prefer to avoid using Rinside for that.

I guess i could hack bigmemory C++ code a bit, compile it into a C++ shared
library and it'll do. I guess i could hack it a bit to work with armadillo
matrices as well. I don't want however to reinvent the wheel and if there
is something like that already somewhere i would rather use it for the
moment. Looking very much for suggestions. If there is truly nothing like
that and someone with C++ or especially java development experience is
interested and want to cooperate on this, let me know too.

Best
Andre

NB. I guess something like what i want -- access to the same disc caches
from R, C++, and java (and python) exists in HDF world. I, however, don't
know how performance of HDF compares with bigmemory matrices, which i come
to like and appreciate a lot. If there is someone who could address
simplicity of use and performance of HDF vs bigmemory, it'd be very
interesting.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] looking for adice on bigmemory framework with C++ and java interoperability

2012-05-04 Thread andre zege
>
> bigmemory matrices are simply arrays of native types (typically doubles,
> but bm supports other types, too) so they are trivially readable/writable
> from both C++ (just read into memory and cast to the array type) and Java
> (e.g, DoubleBuffer view on a ByteBuffer). So the question is what exactly
> is the problem?
>
> Cheers,
> Simon
>
>
>

Simon,  thanks for your comment. I guess there is no problem, i am
apparently being lazy/busy and wondered if there is ready code that does
it. You are right, i suppose -- i'll look at the c++ code for bigmatrix and
will try to hack a solution.


Thanks
Andre

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] how to manipulate dput output format

2012-06-19 Thread andre zege
I am reading into Java dput output for a matrix, more specifically for a
file backed big-matrix. I basically need to lift dimnames for a matrix from
dput output. It's no big deal, but the code is very 'hackish' due to the
need to get rid of quotes, endlines, parenthesis, etc. I was wondering if i
could manipulate to an extent dput output with some options that define it,
for example, get rid of quoting each element  in matirx dimnames somehow.
Another great thing wiould be to make dput dump rownames and colnames on
two separate lines, but i don't think it's possible. To give a specific
example, instead of dput output like


**new("big.matrix.descriptor"
, description = structure(list(sharedType = "FileBacked", filename =
"res", totalRows = 1528,
totalCols = 53040, rowOffset = c(0, 1528), colOffset = c(0,
53040), nrow = 1528, ncol = 53040, rowNames = c("A", "AA",
"RNT.A", "ADVA", "AAPL", "AAS", "ABFS", "ABM", "ABT", "ACI",
...

I'd prefer ideally to have it in the form where rownames and colnames don't
have quotes and newlines and if possible are on separate lines

new("big.matrix.descriptor"
, description = structure(list(sharedType = "FileBacked", filename =
"res", totalRows = 1528,
totalCols = 53040, rowOffset = c(0, 1528), colOffset = c(0,
53040), nrow = 1528, ncol = 53040,
rowNames = c(A, AA, RNT.A, ADVA, AAPL, AAS, ABFS, ABM, ABT, ... )
colNames = c(...)

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] how to manipulate dput output format

2012-06-25 Thread andre zege
>
> dput() is intended to be parsed by R so the above is not possible without
> massaging the output. But why in the would would you use dput() for
> something that you want to read in Java? Why don't you use a format that
> Java can read easily - such as JSON?
>
> Cheers,
> Simon
>
>
>
>
>
Yeap, except i was just working with someone elses choice. Bigmatrix code
uses dput() to dump desc file of filebacked matrices. I got some time to do
a little hack of reading big matrices nicely to java and was looking to
some ways of smoothing the edges of parsing .desc file a little. I guess i
am ok  now with parsing .desc with some regex. One thing i am still
wondering about is whether i really need to convert back and forth between
liitle endian and big endian. Namely, java platform has little endian
native byte order, and big matrix code writes stuff in big endian. It'd be
nice if i could manipulate that by some #define somewhere in the makefile
or something and make C++ write little endian without byte swapping every
time i need to communicate with big matrix from java. Thanks

Andre

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] how to manipulate dput output format

2012-06-25 Thread andre zege
On Mon, Jun 25, 2012 at 11:17 AM, Simon Urbanek  wrote:

>
> On Jun 25, 2012, at 10:20 AM, andre zege wrote:
>
> > dput() is intended to be parsed by R so the above is not possible
> without massaging the output. But why in the would would you use dput() for
> something that you want to read in Java? Why don't you use a format that
> Java can read easily - such as JSON?
> >
> > Cheers,
> > Simon
> >
> >
> >
> >
> >
> > Yeap, except i was just working with someone elses choice. Bigmatrix
> code uses dput() to dump desc file of filebacked matrices.
>
> Ah, ok, that is indeed rather annoying as it's pretty much the most
> non-portable storage (across programs) one could come up with. (I presume
> you're talking about big.matrix from bigmemory?)
>
>
> > I got some time to do a little hack of reading big matrices nicely to
> java and was looking to some ways of smoothing the edges of parsing .desc
> file a little. I guess i am ok  now with parsing .desc with some regex. One
> thing i am still wondering about is whether i really need to convert back
> and forth between liitle endian and big endian. Namely, java platform has
> little endian native byte order, and big matrix code writes stuff in big
> endian. It'd be nice if i could manipulate that by some #define somewhere
> in the makefile or something and make C++ write little endian without byte
> swapping every time i need to communicate with big matrix from java.
>
> I think you're wrong (if we are talking about bigmemory) - the endianness
> is governed by the platform as far as I can see. On little-endian machines
> the big matrix storage is little endian and on big-endian machines it is
> big-endian.
>
> It's very peculiar that the descriptor doesn't even store the endianness -
> I think you could talk to the authors and suggest that they include most
> basic information such as endianness and, possibly, change the format to
> something that is well-defined without having to evaluate it in R (which is
> highly dangerous and a serious security risk).
>
> Cheers,
> Simon
>
>

I would assume that hardware should dictate endianness, just like you said.
However, the fact is that bigmemory writes in different endianness than
java reads in. I simply compare matrices that i write using bigmemory and
that I read into java. Unless i transform endianness, i get gargabe, and if
i swap byte order, i get the same matrix as the one i wrote. So, i don't
think i am wrong about that, but i am curious about why it happens and
whether it is possible to let bigmemory code write in natural endianness.
Then i would not need to transform each double array element back and
forth.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] how to manipulate dput output format

2012-06-25 Thread andre zege
On Mon, Jun 25, 2012 at 1:08 PM, Simon Urbanek
wrote:

>
> On Jun 25, 2012, at 11:57 AM, andre zege wrote:
>
> >
> >
> > On Mon, Jun 25, 2012 at 11:17 AM, Simon Urbanek <
> simon.urba...@r-project.org> wrote:
> >
> > On Jun 25, 2012, at 10:20 AM, andre zege wrote:
> >
> > > dput() is intended to be parsed by R so the above is not possible
> without massaging the output. But why in the would would you use dput() for
> something that you want to read in Java? Why don't you use a format that
> Java can read easily - such as JSON?
> > >
> > > Cheers,
> > > Simon
> > >
> > >
> > >
> > >
> > >
> > > Yeap, except i was just working with someone elses choice. Bigmatrix
> code uses dput() to dump desc file of filebacked matrices.
> >
> > Ah, ok, that is indeed rather annoying as it's pretty much the most
> non-portable storage (across programs) one could come up with. (I presume
> you're talking about big.matrix from bigmemory?)
> >
> >
> > > I got some time to do a little hack of reading big matrices nicely to
> java and was looking to some ways of smoothing the edges of parsing .desc
> file a little. I guess i am ok  now with parsing .desc with some regex. One
> thing i am still wondering about is whether i really need to convert back
> and forth between liitle endian and big endian. Namely, java platform has
> little endian native byte order, and big matrix code writes stuff in big
> endian. It'd be nice if i could manipulate that by some #define somewhere
> in the makefile or something and make C++ write little endian without byte
> swapping every time i need to communicate with big matrix from java.
> >
> > I think you're wrong (if we are talking about bigmemory) - the
> endianness is governed by the platform as far as I can see. On
> little-endian machines the big matrix storage is little endian and on
> big-endian machines it is big-endian.
> >
> > It's very peculiar that the descriptor doesn't even store the endianness
> - I think you could talk to the authors and suggest that they include most
> basic information such as endianness and, possibly, change the format to
> something that is well-defined without having to evaluate it in R (which is
> highly dangerous and a serious security risk).
> >
> > Cheers,
> > Simon
> >
> >
> >
> > I would assume that hardware should dictate endianness, just like you
> said. However, the fact is that bigmemory writes in different endianness
> than java reads in. I simply compare matrices that i write using bigmemory
> and that I read into java. Unless i transform endianness, i get gargabe,
> and if i swap byte order, i get the same matrix as the one i wrote. So, i
> don't think i am wrong about that, but i am curious about why it happens
> and whether it is possible to let bigmemory code write in natural
> endianness. Then i would not need to transform each double array element
> back and forth.
> >
>
> I think it has to do with the way you read it in Java since Java supports
> either endianness directly. What methods do you use exactly to read it? The
> on-disk storage is definitely native-endian so C/C++/... can simply mmap it
> with no swapping.
>
> Cheers,
> Simon
>
>
>


It's my first week doing Java, actually:),I simply did the following to
read binary file

 public static double[] readVector(String fileName) throws IOException{
FileChannel rChannel = new RandomAccessFile(new File(fileName),
"r").getChannel();
DoubleBuffer dBuf = rChannel.map(FileChannel.MapMode.READ_ONLY, 0,
rChannel.size()).asDoubleBuffer();

double []  vData = new double[(int) rChannel.size()/8];
dBuf.get(vData);
return vData;


}

i just realized that DoubleBuffer is derived from BytBuffer and reading
Java 5 doc for ByteBuffer i see "The initial order of a byte buffer is
always BIG_ENDIAN".So in fact i just need to check ByteOrder and change it
if it's different from native. So, correct code should look like this it
seems


public static double[] readVector(String fileName) throws IOException{
FileChannel rChannel = new RandomAccessFile(new File(fileName),
"r").getChannel();
MappedByteBuffer mbb= rChannel.map(FileChannel.MapMode.READ_ONLY,
0, rChannel.size());
if(mbb.order() != ByteOrder.nativeOrder())
mbb.order(ByteOrder.nativeOrder());

DoubleBuffer dBuf = mbb.asDoubleBuffer();
double []  vData = new double[(int) rChannel.size()/8];
dBuf.get(vData);
System.out.println(vData);
return vData;


}

Sorry for the confusion and thanks for the lesson, Simon :)

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel