Re: [Rd] how to manipulate dput output format

Simon Urbanek Mon, 25 Jun 2012 10:12:33 -0700

On Jun 25, 2012, at 11:57 AM, andre zege wrote:

> 
> 
> On Mon, Jun 25, 2012 at 11:17 AM, Simon Urbanek <simon.urba...@r-project.org> 
> wrote:
> 
> On Jun 25, 2012, at 10:20 AM, andre zege wrote:
> 
> > dput() is intended to be parsed by R so the above is not possible without 
> > massaging the output. But why in the would would you use dput() for 
> > something that you want to read in Java? Why don't you use a format that 
> > Java can read easily - such as JSON?
> >
> > Cheers,
> > Simon
> >
> >
> >
> >
> >
> > Yeap, except i was just working with someone elses choice. Bigmatrix code 
> > uses dput() to dump desc file of filebacked matrices.
> 
> Ah, ok, that is indeed rather annoying as it's pretty much the most 
> non-portable storage (across programs) one could come up with. (I presume 
> you're talking about big.matrix from bigmemory?)
> 
> 
> > I got some time to do a little hack of reading big matrices nicely to java 
> > and was looking to some ways of smoothing the edges of parsing .desc file a 
> > little. I guess i am ok  now with parsing .desc with some regex. One thing 
> > i am still wondering about is whether i really need to convert back and 
> > forth between liitle endian and big endian. Namely, java platform has 
> > little endian native byte order, and big matrix code writes stuff in big 
> > endian. It'd be nice if i could manipulate that by some #define somewhere 
> > in the makefile or something and make C++ write little endian without byte 
> > swapping every time i need to communicate with big matrix from java.
> 
> I think you're wrong (if we are talking about bigmemory) - the endianness is 
> governed by the platform as far as I can see. On little-endian machines the 
> big matrix storage is little endian and on big-endian machines it is 
> big-endian.
> 
> It's very peculiar that the descriptor doesn't even store the endianness - I 
> think you could talk to the authors and suggest that they include most basic 
> information such as endianness and, possibly, change the format to something 
> that is well-defined without having to evaluate it in R (which is highly 
> dangerous and a serious security risk).
> 
> Cheers,
> Simon
> 
> 
> 
> I would assume that hardware should dictate endianness, just like you said. 
> However, the fact is that bigmemory writes in different endianness than java 
> reads in. I simply compare matrices that i write using bigmemory and that I 
> read into java. Unless i transform endianness, i get gargabe, and if i swap 
> byte order, i get the same matrix as the one i wrote. So, i don't think i am 
> wrong about that, but i am curious about why it happens and whether it is 
> possible to let bigmemory code write in natural endianness. Then i would not 
> need to transform each double array element back and forth. 
>


I think it has to do with the way you read it in Java since Java supports 
either endianness directly. What methods do you use exactly to read it? The 
on-disk storage is definitely native-endian so C/C++/... can simply mmap it 
with no swapping.

Cheers,
Simon

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] how to manipulate dput output format

Reply via email to