Re: [Rd] Reading 64-bit integers

Duncan Murdoch Tue, 29 Mar 2011 17:48:56 -0700

On 29/03/2011 7:01 PM, Jon Clayden wrote:

Dear Simon,


On 29 March 2011 22:40, Simon Urbanek<simon.urba...@r-project.org>  wrote:

Jon,

On Mar 29, 2011, at 1:33 PM, Jon Clayden wrote:

Dear Simon,

Thank you for the response.

On 29 March 2011 15:06, Simon Urbanek<simon.urba...@r-project.org>  wrote:


On Mar 29, 2011, at 8:46 AM, Jon Clayden wrote:

Dear all,

I see from some previous threads that support for 64-bit integers in R
may be an aim for future versions, but in the meantime I'm wondering
whether it is possible to read in integers of greater than 32 bits at
all. Judging from ?readBin, it should be possible to read 8-byte
integers to some degree, but it is clearly limited in practice by R's
internally 32-bit integer type:

x<- as.raw(c(0,0,0,0,1,0,0,0))
(readBin(x,"integer",n=1,size=8,signed=F,endian="big"))

[1] 16777216

x<- as.raw(c(0,0,0,1,0,0,0,0))
(readBin(x,"integer",n=1,size=8,signed=F,endian="big"))

[1] 0

For values that fit into 32 bits it works fine, but for larger values
it fails. (I'm a bit surprised by the zero - should the value not be
NA if it is out of range?


No, it's not out of range - int is only 4 bytes so only 4 first bytes 
(respecting endianness order, hence LSB) are used.


The fact remains that I ask for the value of an 8-byte integer and
don't get it.


I think you're misinterpreting the documentation:

     If ‘size’ is specified and not the natural size of the object,
     each element of the vector is coerced to an appropriate type
     before being written or as it is read.

The "integer" object type is defined as signed 32-bit in R, so if you ask for "8 bytes into object type integer", you 
get a coercion into that object type -- 32-bit signed integer -- as documented. I think the issue may come from the confusion of the object 
type "integer" with general "integer number" in mathematical sense that has no representation restrictions. (FWIW in C 
the "integer" type is "int" and it is 32-bit on all modern OSes regardless of platform - that's where the limitation 
comes from, it's not something R has made up).


OK, but it still seems like there is a case for raising a warning. As
it is there is no way to tell when reading an 8-byte integer from a
file whether its value is really 0, or if it merely has 0 in its
least-significant 4 bytes. If 99% of such stored numbers are below
2^31, one is going to need some extra logic to catch the other 1%
where you (silently) get the wrong value. In essence, unless you're
certain that you will never come across a number that actually uses
the upper 4 bytes, you will always have to read it as two 4-byte
numbers and check that the high-order one (which is endianness
dependent, of course) is zero. A C-level sanity check seems more
efficient and more helpful to me.

Seems to me that the S-PLUS solution (output="double") would be a lotmore useful. I'd commit that if you write it; I don't think I'd committhe warning.


Duncan Murdoch

Pretending that it's really only four bytes because of
the limits of R's integer type isn't all that helpful. Perhaps a
warning should be put out if the cast will affect the value of the
result? It looks like the relevant lines in src/main/connections.c are
3689-3697 in the current alpha:

#if SIZEOF_LONG == 8
                   case sizeof(long):
                       INTEGER(ans)[i] = (int)*((long *)buf);
                       break;
#elif SIZEOF_LONG_LONG == 8
                   case sizeof(_lli_t):
                       INTEGER(ans)[i] = (int)*((_lli_t *)buf);
                       break;
#endif

) The value can be represented as a double,
though:

4294967296

[1] 4294967296

I wouldn't expect readBin() to return a double if an integer was
requested, but is there any way to get the correct value out of it?


Trivially (for your unsigned big-endian case):

y<- readBin(x, "integer", n=length(x)/4L, endian="big")
y<- ifelse(y<  0, 2^32 + y, y)
i<- seq(1,length(y),2)
y<- y[i] * 2^32 + y[i + 1L]


Thanks for the code, but I'm not sure I would call that trivial,
especially if one needs to cater for little endian and signed cases as
well!


I was saying for your case and it's trivial as in read as integers, convert to 
double precision and add.

This is what I meant by reconstructing the number manually...


You didn't say so - you were talking about reconstructing it from a raw vector 
which seems a lot more painful since you can't compute with enough precision on 
raw vectors.


True - I should have been more specific. Sorry.

Jon

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Reading 64-bit integers

Reply via email to