[Rd] libRblas.so: undefined reference to `xerbla_' ?
Hi all, i am trying to compile a test, calling from C code R Lapack shared libraries. In particular, i am calling simple LAPACK driver dposv for solving linear equation system A*x=B with positive definite A. My code looks like the following in solve.c == #include #include #include int main(){ double A[4]={1,0.5,0.5,1}; double B[2]={3,4}; char uplo='U'; int n = 2, nrhs=1, lda=2, ldb=2, info, i; F77_CALL(dposv)(&uplo,&n, &nrhs, A, &lda, B, &ldb, &info); for(i=0; i<2; i++){ printf("%f\n", B[i]); } return info; } == When I am trying to link to BLAS/LAPACK using gcc -std=gnu99 solve.c -o test -I$R_HOME/include -L$R_HOME/lib -lRblas -lRlapack -lgfortran linker generates an error message $RHOME/lib/libRblas.so: undefined reference to `xerbla_' Dumping symbol table shows that indeed libRblas.so has undefined xerbla_ symbol and so does libRlapack. Confusingly, documentation says that xerbla is error checking routine for BLAS, but it is not found in the library libRblas. I did find out that xerbla is defined in libR.so and when i link to R library, everything seems to go fine. However, i have a nagging feeling i am doing something wrong. It doesn't make sense to me that i cannot compile code that doesn't use R without linking to R. Also, one would want to switch transparently between different implementations of BLAS for example for testing purposes and not modify linking instructions. Would appreciate if someone with better understanding of R commented on how to properly link to BLAS and LAPACK libraries included with R. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Newbie Rccp module question. "Failed to initialize module pointer"???
Hi all. I started looking at Rcpp, which looks pretty great, actually. At the moment just trying to compile a module to get a feel how it all works without fully understanding how all the pieces fit together. Basically, i took the first example from Rcpp modules vignette: fun.cpp #include #include using namespace Rcpp; double norm(double x, double y){ return sqrt(x*x+y*y); } RCPP_MODULE(mod){ function("norm", &norm); } == I then run Rcpp.package.skeleton("mypackage"), put fun.cpp in mypackage/src and did R CMD INSTALL mypackage, which seemed to compile mypackage.so OK. However, when i am trying to use module, i get error message. Namely, after i run R and do >library("Rcpp") >library("mypackage") > mod<-Module("mod") >mod$norm(3,4) i get the following Error in Module(module, mustStart = TRUE) : Failed to initialize module pointer: Error in FUN("_rcpp_module_boot_mod"[[1L]], ...): no such symbol _rcpp_module_boot_mod in package .GlobalEnv I am pretty sure my error is a pretty obvious one, could someone give me a pointer on what to do differently or where to look for reference. Literal search for the error message doesn't bring anything useful. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] can one modify array in R memory from C++ without copying it?
Hi, guys. I posted this by accident at rcpp-dev, although it meant to be only to r-dev, so don't flame me here please, rcpp guys will do it there, i am sure :). I have some pretty large arrays in R and i wanted to do some time consuming modifications of these arrays in C++ without actually copying them, just by passing pointers to them. Since i don't know internal data structures of R, i am not sure it's possible, but i thought it was. Here is some toy code that i thought should work, but doesn't. Maybe someone could point out the error i am making i have the following in the passptr.cpp to multiply array elements by 2 === extern "C"{ void modify(double *mem, int *nr, int *nc){ for(int i=0; i< (*nr)*(*nc); i++) mem[i]=2*mem[i]; } } -- I compile it into a shared library using R CMD SHLIB passptr.cpp load and run from R as follows >dyn.load("/home/az05625/testarma/passptr.so") >m<-matrix(1:10,nr=2) >.C("modify", as.double(m), as.integer(2), as.integer(5), DUP=FALSE) >From reading docs i thought that DUP=FALSE would ensure that R matrix is not copied and is multiplied by 2 in place. However, it's not the case, matrix m is the same after calling .C("modify"...) as it was before. Am i calling incorrectly, or is it just impossible to modify R matrix in place from C++? Would greatly appreciate any pointers. [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] unexpectedly high memory use in R 2.14.0
I recently started using R 2.14.0 on a new machine and i am experiencing what seems like unusually greedy memory use. It happens all the time, but to give a specific example, let's say i run the following code for(j in 1:length(files)){ load(file.path(dump.dir, files[j])) mat.data[[j]]<-data } save(abind(mat.data, along=2), file.path(dump.dir, filename)) - It loads parts of multidimensional matrix into a list, then binds it along second dimension and saves on disk. Code works, although slowly, but what's strange is the amount of memory it uses. In particular, each chunk of data is between 50M to 100M, and altogether the binded matrix is 1.3G. One would expect that R would use roughly double that memory - to keep mat.data and its binded version separately, or 1G. I could imagine that for somehow it could use 3 times the size of matrix. But in fact it uses more than 5.5 times (almost all of my physical memory) and i think is swapping a lot to disk . For this particular task, my top output shows eating more than 7G of memory and using up 11G of virtual memory as well $top PIDUSER PR NI VIRTRES SHR S %CPU %MEMTIME+ COMMAND 8823 user25 0 11g 7.2g 10m R 99.7 92.9 5:55.05 R 8590 root 15 0 154m 16m 5948 S 0.5 0.2 23:22.40 Xorg I have strong suspicion that something is off with my R binary, i don't think i experienced things like that in a long time. Is this in line with what i am supposed to experience? Are there any ideas for diagnosing what is going on? Would appreciate any suggestions Thanks Andre == Here is what i am running on: CentOS release 5.5 (Final) > sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] en_US.UTF-8 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] abind_1.4-0 rJava_0.9-3 R.utils_1.12.1R.oo_1.9.3 R.methodsS3_1.2.2 loaded via a namespace (and not attached): [1] codetools_0.2-8 tcltk_2.14.0tools_2.14.0 I compiled R configure as follows /configure --prefix=/usr/local/R --enable-byte-compiled-packages=no --with-tcltk --enable-R-shlib=yes [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] unexpectedly high memory use in R 2.14.0
You are quite right that my exec time would seriously go down if i pre-allocate and not even use abind, just assign into the preallocated matrix. The reason i didn't do it here is that this is a part of some utility function that doesn't know the sizes of chunks that are on disk untill it reads all of them. If i knew a way to read dimnames off disk without reading whole matrices, i could do what you are suggesting. I guess i am better off using filebacked matrices from bigmemory, where i could read dimnames off disk without reading the matrix. I need to unwrap 4 dim arrays into 2 dim arrays and wrap them back, but i guess it would be faster anyway. My question, however was not so much about speed improvement of a particular task. It was whether this memory use of 7.2g physical memory and 11g of virtual makes sense when i am building a 1.3G matrix with this code. It just seems to me that my memory goes to almost 100% physical not just on this task but on others. I wonder if there is something seriously off with my memory experience and if i should rebuild R. In term of your lapply solution, it indeed used much less memory, in fact about 25% less memory than the loop, about 4 times the size of the final object. I am still not clear if my memory use makes sense in terms of R memory model and I am frankly not clear why lapply uses less memory. (I understand why it makes less copying) On Wed, Apr 11, 2012 at 7:15 PM, peter dalgaard wrote: > > On Apr 12, 2012, at 00:53 , andre zege wrote: > > > I recently started using R 2.14.0 on a new machine and i am experiencing > > what seems like unusually greedy memory use. It happens all the time, but > > to give a specific example, let's say i run the following code > > > > > > > > for(j in 1:length(files)){ > > load(file.path(dump.dir, files[j])) > > mat.data[[j]]<-data > > } > > save(abind(mat.data, along=2), file.path(dump.dir, filename)) > > Hmm, did you preallocate mat.data? If not, you will be copying it > repeatedly, and I'm not sure that this can be done by copying pointers only. > > Does it work better with > > mat.data <- lapply(files, function(name) {load(file.path(dump.dir, name); > data}) > > ? > > > > > > - > > > > It loads parts of multidimensional matrix into a list, then binds it > along > > second dimension and saves on disk. Code works, although slowly, but > what's > > strange is the amount of memory it uses. > > In particular, each chunk of data is between 50M to 100M, and altogether > > the binded matrix is 1.3G. One would expect that R would use roughly > double > > that memory - to keep mat.data and its binded version separately, or 1G. > I > > could imagine that for somehow it could use 3 times the size of matrix. > But > > in fact it uses more than 5.5 times (almost all of my physical memory) > and > > i think is swapping a lot to disk . For this particular task, my top > output > > shows eating more than 7G of memory and using up 11G of virtual memory as > > well > > > > $top > > > > PIDUSER PR NI VIRTRES SHR S %CPU %MEMTIME+ COMMAND > > 8823 user25 0 11g 7.2g 10m R 99.7 92.9 > > 5:55.05 > > R > > > > 8590 root 15 0 154m 16m 5948 S 0.5 0.2 > > 23:22.40 Xorg > > > > > > I have strong suspicion that something is off with my R binary, i don't > > think i experienced things like that in a long time. Is this in line with > > what i am supposed to experience? Are there any ideas for diagnosing what > > is going on? > > Would appreciate any suggestions > > > > Thanks > > Andre > > > > > > == > > > > Here is what i am running on: > > > > > > CentOS release 5.5 (Final) > > > > > >> sessionInfo() > > R version 2.14.0 (2011-10-31) > > Platform: x86_64-unknown-linux-gnu (64-bit) > > > > locale: > > [1] en_US.UTF-8 > > > > attached base packages: > > [1] stats graphics grDevices datasets utils methods base > > > > other attached packages: > > [1] abind_1.4-0 rJava_0.9-3 R.utils_1.12.1R.oo_1.9.3 > > R.methodsS3_1.2.2 > > > > loaded via a namespace (and not attached): > > [1] codetools_0.2-8 tcltk_2.14.0tools_2.14.0 > > > > > > > > I compiled R configure as follows > > /configure --prefix=/usr/local/R --enable-byte-compiled-packages=no > > --with-tcltk --enable-R-shlib=yes > > > > [[alternative HTML version deleted]] > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd@cbs.dk Priv: pda...@gmail.com > > > > > > > > > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] unexpectedly high memory use in R 2.14.0
Henrik, thanks for your reply. I might have misrepresented a bit my actual code . It seems that you are suggesting doing rm() on objects i don't use. In the real code which behavior i reported it is exactly what is being done, i.e i use rm(). I also use a small wrapper around load that lets me assign loaded data directly into any a variable with any name, without remembering the name of the object from which it was saved, i.e instead of standard load i use something like (with error checking in real code) ut.load(filename)<-function(filename){ load(filename); s<-ls(); get("obj") } in other words, after i called data[[j]]<-ut.load(file[j]), there is no reference to intermediary object to clean, i am assuming garbage collector quickly takes care of it. Just making sure that we are on the same page. I am mostly looking for some guidance on what to expect in terms of R memory behavior. This particular task is just an illustration of a typical issue that i encounter often lately. Is there a way to diagnose if everything is normal with a particular task in terms of memory use? Is there a memory benchmark? Is there some white paper discussing how memory and copying of objects actually works in R? Is there a limited chunk of C code that i could read to try to understand it? I just don't want to read all of the C code. Thanks much Andre On Wed, Apr 11, 2012 at 9:02 PM, Henrik Bengtsson wrote: > Leaving aside what's going on inside abind::abind(), maybe the > following sheds some light on what's is being wasted: > > # Preallocate (probably doesn't make a difference because it's a list) > mat.data <- vector("list", length=length(files)); > for (j in 1:length(files)){ > vars <- load(file.path(dump.dir, files[j])) > mat.data[[j]]<-data; > # Not needed anymore/remove everything loaded > rm(list=vars); > } > > data <- abind(mat.data, along=2); > # Not needed anymore > rm(mat.data); > > save(data, file.path(dump.dir, filename)) > > My $.02 > /Henrik > > On Wed, Apr 11, 2012 at 3:53 PM, andre zege wrote: > > I recently started using R 2.14.0 on a new machine and i am experiencing > > what seems like unusually greedy memory use. It happens all the time, but > > to give a specific example, let's say i run the following code > > > > > > > > for(j in 1:length(files)){ > > load(file.path(dump.dir, files[j])) > > mat.data[[j]]<-data > > } > > save(abind(mat.data, along=2), file.path(dump.dir, filename)) > > > > - > > > > It loads parts of multidimensional matrix into a list, then binds it > along > > second dimension and saves on disk. Code works, although slowly, but > what's > > strange is the amount of memory it uses. > > In particular, each chunk of data is between 50M to 100M, and altogether > > the binded matrix is 1.3G. One would expect that R would use roughly > double > > that memory - to keep mat.data and its binded version separately, or 1G. > I > > could imagine that for somehow it could use 3 times the size of matrix. > But > > in fact it uses more than 5.5 times (almost all of my physical memory) > and > > i think is swapping a lot to disk . For this particular task, my top > output > > shows eating more than 7G of memory and using up 11G of virtual memory as > > well > > > > $top > > > > PIDUSER PR NI VIRTRES SHR S %CPU %MEMTIME+ COMMAND > > 8823 user25 0 11g 7.2g 10m R 99.7 92.9 > > 5:55.05 > > R > > > > 8590 root 15 0 154m 16m 5948 S 0.5 0.2 > > 23:22.40 Xorg > > > > > > I have strong suspicion that something is off with my R binary, i don't > > think i experienced things like that in a long time. Is this in line with > > what i am supposed to experience? Are there any ideas for diagnosing what > > is going on? > > Would appreciate any suggestions > > > > Thanks > > Andre > > > > > > == > > > > Here is what i am running on: > > > > > > CentOS release 5.5 (Final) > > > > > >> sessionInfo() > > R version 2.14.0 (2011-10-31) > > Platform: x86_64-unknown-linux-gnu (64-bit) > > > > locale: > > [1] en_US.UTF-8 > > > > attached base packages: > > [1] stats graphics grDevices datasets utils methods base > > > > other attached packages: > > [1] abind_1.4-0 rJava_0.9-3 R.utils_1.12.1R.oo_1.9.3 > > R.methodsS3_1.2.2 > > > > loaded via a namespace
[Rd] R-2.15 compile error: fatal error: internal consistency failure
I am unable to compile R-2.15.0 source. I configured it without problems with options that i used many times before ./configure --prefix=/home/andre/R-2.15.0 --enable-byte-compiled-packages=no --with-tcltk --enable-R-shlib=yes Then when i started making it, it died while making lapack, particularly on the line gfortran -fopenmp -fpic -g -O2 -c dlapack3.f -o dlapack3.o dlapack3.f: In function dsbgst: dlapack3.f:12097: fatal error: internal consistency failure compilation terminated. make[4]: *** [dlapack3.o] Error 1 Could anyone give me a clue what is going wrong and how could i fix that? I am running Centos 5.5, in particular, the following $ more /proc/version Linux version 2.6.18-194.el5 (mockbu...@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Fri Apr 2 14:58:14 EDT 2010 Thanks Andre [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] looking for adice on bigmemory framework with C++ and java interoperability
I work with problems that have rather large data requirements -- typically a bunch of multigig arrays. Given how generous R is with using memory, the only way for me to work with R has been to use bigmatrices from bigmemory package. One thing that is missing a bit is interoperability of bigmatrices with C++ and possibly java. What i mean by that is API that would allow read and write filebacked matrices from C++, and ideally java without being called from R. Having ability to save armadillo matrices into filebacked matrices and load them back into armadillo would be another very useful thing. This would allow really smooth cooperation between various pieces of software. I would prefer to avoid using Rinside for that. I guess i could hack bigmemory C++ code a bit, compile it into a C++ shared library and it'll do. I guess i could hack it a bit to work with armadillo matrices as well. I don't want however to reinvent the wheel and if there is something like that already somewhere i would rather use it for the moment. Looking very much for suggestions. If there is truly nothing like that and someone with C++ or especially java development experience is interested and want to cooperate on this, let me know too. Best Andre NB. I guess something like what i want -- access to the same disc caches from R, C++, and java (and python) exists in HDF world. I, however, don't know how performance of HDF compares with bigmemory matrices, which i come to like and appreciate a lot. If there is someone who could address simplicity of use and performance of HDF vs bigmemory, it'd be very interesting. [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] looking for adice on bigmemory framework with C++ and java interoperability
> > bigmemory matrices are simply arrays of native types (typically doubles, > but bm supports other types, too) so they are trivially readable/writable > from both C++ (just read into memory and cast to the array type) and Java > (e.g, DoubleBuffer view on a ByteBuffer). So the question is what exactly > is the problem? > > Cheers, > Simon > > > Simon, thanks for your comment. I guess there is no problem, i am apparently being lazy/busy and wondered if there is ready code that does it. You are right, i suppose -- i'll look at the c++ code for bigmatrix and will try to hack a solution. Thanks Andre [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] how to manipulate dput output format
I am reading into Java dput output for a matrix, more specifically for a file backed big-matrix. I basically need to lift dimnames for a matrix from dput output. It's no big deal, but the code is very 'hackish' due to the need to get rid of quotes, endlines, parenthesis, etc. I was wondering if i could manipulate to an extent dput output with some options that define it, for example, get rid of quoting each element in matirx dimnames somehow. Another great thing wiould be to make dput dump rownames and colnames on two separate lines, but i don't think it's possible. To give a specific example, instead of dput output like **new("big.matrix.descriptor" , description = structure(list(sharedType = "FileBacked", filename = "res", totalRows = 1528, totalCols = 53040, rowOffset = c(0, 1528), colOffset = c(0, 53040), nrow = 1528, ncol = 53040, rowNames = c("A", "AA", "RNT.A", "ADVA", "AAPL", "AAS", "ABFS", "ABM", "ABT", "ACI", ... I'd prefer ideally to have it in the form where rownames and colnames don't have quotes and newlines and if possible are on separate lines new("big.matrix.descriptor" , description = structure(list(sharedType = "FileBacked", filename = "res", totalRows = 1528, totalCols = 53040, rowOffset = c(0, 1528), colOffset = c(0, 53040), nrow = 1528, ncol = 53040, rowNames = c(A, AA, RNT.A, ADVA, AAPL, AAS, ABFS, ABM, ABT, ... ) colNames = c(...) [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] how to manipulate dput output format
> > dput() is intended to be parsed by R so the above is not possible without > massaging the output. But why in the would would you use dput() for > something that you want to read in Java? Why don't you use a format that > Java can read easily - such as JSON? > > Cheers, > Simon > > > > > Yeap, except i was just working with someone elses choice. Bigmatrix code uses dput() to dump desc file of filebacked matrices. I got some time to do a little hack of reading big matrices nicely to java and was looking to some ways of smoothing the edges of parsing .desc file a little. I guess i am ok now with parsing .desc with some regex. One thing i am still wondering about is whether i really need to convert back and forth between liitle endian and big endian. Namely, java platform has little endian native byte order, and big matrix code writes stuff in big endian. It'd be nice if i could manipulate that by some #define somewhere in the makefile or something and make C++ write little endian without byte swapping every time i need to communicate with big matrix from java. Thanks Andre [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] how to manipulate dput output format
On Mon, Jun 25, 2012 at 11:17 AM, Simon Urbanek wrote: > > On Jun 25, 2012, at 10:20 AM, andre zege wrote: > > > dput() is intended to be parsed by R so the above is not possible > without massaging the output. But why in the would would you use dput() for > something that you want to read in Java? Why don't you use a format that > Java can read easily - such as JSON? > > > > Cheers, > > Simon > > > > > > > > > > > > Yeap, except i was just working with someone elses choice. Bigmatrix > code uses dput() to dump desc file of filebacked matrices. > > Ah, ok, that is indeed rather annoying as it's pretty much the most > non-portable storage (across programs) one could come up with. (I presume > you're talking about big.matrix from bigmemory?) > > > > I got some time to do a little hack of reading big matrices nicely to > java and was looking to some ways of smoothing the edges of parsing .desc > file a little. I guess i am ok now with parsing .desc with some regex. One > thing i am still wondering about is whether i really need to convert back > and forth between liitle endian and big endian. Namely, java platform has > little endian native byte order, and big matrix code writes stuff in big > endian. It'd be nice if i could manipulate that by some #define somewhere > in the makefile or something and make C++ write little endian without byte > swapping every time i need to communicate with big matrix from java. > > I think you're wrong (if we are talking about bigmemory) - the endianness > is governed by the platform as far as I can see. On little-endian machines > the big matrix storage is little endian and on big-endian machines it is > big-endian. > > It's very peculiar that the descriptor doesn't even store the endianness - > I think you could talk to the authors and suggest that they include most > basic information such as endianness and, possibly, change the format to > something that is well-defined without having to evaluate it in R (which is > highly dangerous and a serious security risk). > > Cheers, > Simon > > I would assume that hardware should dictate endianness, just like you said. However, the fact is that bigmemory writes in different endianness than java reads in. I simply compare matrices that i write using bigmemory and that I read into java. Unless i transform endianness, i get gargabe, and if i swap byte order, i get the same matrix as the one i wrote. So, i don't think i am wrong about that, but i am curious about why it happens and whether it is possible to let bigmemory code write in natural endianness. Then i would not need to transform each double array element back and forth. [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] how to manipulate dput output format
On Mon, Jun 25, 2012 at 1:08 PM, Simon Urbanek wrote: > > On Jun 25, 2012, at 11:57 AM, andre zege wrote: > > > > > > > On Mon, Jun 25, 2012 at 11:17 AM, Simon Urbanek < > simon.urba...@r-project.org> wrote: > > > > On Jun 25, 2012, at 10:20 AM, andre zege wrote: > > > > > dput() is intended to be parsed by R so the above is not possible > without massaging the output. But why in the would would you use dput() for > something that you want to read in Java? Why don't you use a format that > Java can read easily - such as JSON? > > > > > > Cheers, > > > Simon > > > > > > > > > > > > > > > > > > Yeap, except i was just working with someone elses choice. Bigmatrix > code uses dput() to dump desc file of filebacked matrices. > > > > Ah, ok, that is indeed rather annoying as it's pretty much the most > non-portable storage (across programs) one could come up with. (I presume > you're talking about big.matrix from bigmemory?) > > > > > > > I got some time to do a little hack of reading big matrices nicely to > java and was looking to some ways of smoothing the edges of parsing .desc > file a little. I guess i am ok now with parsing .desc with some regex. One > thing i am still wondering about is whether i really need to convert back > and forth between liitle endian and big endian. Namely, java platform has > little endian native byte order, and big matrix code writes stuff in big > endian. It'd be nice if i could manipulate that by some #define somewhere > in the makefile or something and make C++ write little endian without byte > swapping every time i need to communicate with big matrix from java. > > > > I think you're wrong (if we are talking about bigmemory) - the > endianness is governed by the platform as far as I can see. On > little-endian machines the big matrix storage is little endian and on > big-endian machines it is big-endian. > > > > It's very peculiar that the descriptor doesn't even store the endianness > - I think you could talk to the authors and suggest that they include most > basic information such as endianness and, possibly, change the format to > something that is well-defined without having to evaluate it in R (which is > highly dangerous and a serious security risk). > > > > Cheers, > > Simon > > > > > > > > I would assume that hardware should dictate endianness, just like you > said. However, the fact is that bigmemory writes in different endianness > than java reads in. I simply compare matrices that i write using bigmemory > and that I read into java. Unless i transform endianness, i get gargabe, > and if i swap byte order, i get the same matrix as the one i wrote. So, i > don't think i am wrong about that, but i am curious about why it happens > and whether it is possible to let bigmemory code write in natural > endianness. Then i would not need to transform each double array element > back and forth. > > > > I think it has to do with the way you read it in Java since Java supports > either endianness directly. What methods do you use exactly to read it? The > on-disk storage is definitely native-endian so C/C++/... can simply mmap it > with no swapping. > > Cheers, > Simon > > > It's my first week doing Java, actually:),I simply did the following to read binary file public static double[] readVector(String fileName) throws IOException{ FileChannel rChannel = new RandomAccessFile(new File(fileName), "r").getChannel(); DoubleBuffer dBuf = rChannel.map(FileChannel.MapMode.READ_ONLY, 0, rChannel.size()).asDoubleBuffer(); double [] vData = new double[(int) rChannel.size()/8]; dBuf.get(vData); return vData; } i just realized that DoubleBuffer is derived from BytBuffer and reading Java 5 doc for ByteBuffer i see "The initial order of a byte buffer is always BIG_ENDIAN".So in fact i just need to check ByteOrder and change it if it's different from native. So, correct code should look like this it seems public static double[] readVector(String fileName) throws IOException{ FileChannel rChannel = new RandomAccessFile(new File(fileName), "r").getChannel(); MappedByteBuffer mbb= rChannel.map(FileChannel.MapMode.READ_ONLY, 0, rChannel.size()); if(mbb.order() != ByteOrder.nativeOrder()) mbb.order(ByteOrder.nativeOrder()); DoubleBuffer dBuf = mbb.asDoubleBuffer(); double [] vData = new double[(int) rChannel.size()/8]; dBuf.get(vData); System.out.println(vData); return vData; } Sorry for the confusion and thanks for the lesson, Simon :) [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel