Draft patch attached. I haven't modified internal code before, so there may be a mistake in how I handle the mechanics, but hopefully this is a useful starting point. At any rate, the base package tests still work and it seems to function as intended:
> x <- as.raw(c(0,0,0,1,0,0,0,0)) > (readBin(x,"integer",n=1,size=8,signed=F,endian="big")) [1] 0 > (readBin(x,"integer",n=1,size=8,signed=F,endian="big",double.out=T)) [1] 4294967296 > storage.mode(readBin(x,"integer",n=1,size=8,signed=F,endian="big",double.out=T)) [1] "double" The "double.out" argument is ignored unless "what" is integer. As far as I can tell there is no definition of unsigned long long akin to the one for long long (at the top of connections.c), so I have not handled the unsigned case for that type. The diff is against the current beta, but I can provide a SVN diff against the trunk if that is preferable. All the best, Jon On 30 March 2011 02:49, Simon Urbanek <simon.urba...@r-project.org> wrote: > > On Mar 29, 2011, at 8:47 PM, Duncan Murdoch wrote: > >> On 29/03/2011 7:01 PM, Jon Clayden wrote: >>> Dear Simon, >>> >>> On 29 March 2011 22:40, Simon Urbanek<simon.urba...@r-project.org> wrote: >>>> Jon, >>>> >>>> On Mar 29, 2011, at 1:33 PM, Jon Clayden wrote: >>>> >>>>> Dear Simon, >>>>> >>>>> Thank you for the response. >>>>> >>>>> On 29 March 2011 15:06, Simon Urbanek<simon.urba...@r-project.org> wrote: >>>>>> >>>>>> On Mar 29, 2011, at 8:46 AM, Jon Clayden wrote: >>>>>> >>>>>>> Dear all, >>>>>>> >>>>>>> I see from some previous threads that support for 64-bit integers in R >>>>>>> may be an aim for future versions, but in the meantime I'm wondering >>>>>>> whether it is possible to read in integers of greater than 32 bits at >>>>>>> all. Judging from ?readBin, it should be possible to read 8-byte >>>>>>> integers to some degree, but it is clearly limited in practice by R's >>>>>>> internally 32-bit integer type: >>>>>>> >>>>>>>> x<- as.raw(c(0,0,0,0,1,0,0,0)) >>>>>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big")) >>>>>>> [1] 16777216 >>>>>>>> x<- as.raw(c(0,0,0,1,0,0,0,0)) >>>>>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big")) >>>>>>> [1] 0 >>>>>>> >>>>>>> For values that fit into 32 bits it works fine, but for larger values >>>>>>> it fails. (I'm a bit surprised by the zero - should the value not be >>>>>>> NA if it is out of range? >>>>>> >>>>>> No, it's not out of range - int is only 4 bytes so only 4 first bytes >>>>>> (respecting endianness order, hence LSB) are used. >>>>> >>>>> The fact remains that I ask for the value of an 8-byte integer and >>>>> don't get it. >>>> >>>> I think you're misinterpreting the documentation: >>>> >>>> If ‘size’ is specified and not the natural size of the object, >>>> each element of the vector is coerced to an appropriate type >>>> before being written or as it is read. >>>> >>>> The "integer" object type is defined as signed 32-bit in R, so if you ask >>>> for "8 bytes into object type integer", you get a coercion into that >>>> object type -- 32-bit signed integer -- as documented. I think the issue >>>> may come from the confusion of the object type "integer" with general >>>> "integer number" in mathematical sense that has no representation >>>> restrictions. (FWIW in C the "integer" type is "int" and it is 32-bit on >>>> all modern OSes regardless of platform - that's where the limitation comes >>>> from, it's not something R has made up). >>> >>> OK, but it still seems like there is a case for raising a warning. As >>> it is there is no way to tell when reading an 8-byte integer from a >>> file whether its value is really 0, or if it merely has 0 in its >>> least-significant 4 bytes. If 99% of such stored numbers are below >>> 2^31, one is going to need some extra logic to catch the other 1% >>> where you (silently) get the wrong value. In essence, unless you're >>> certain that you will never come across a number that actually uses >>> the upper 4 bytes, you will always have to read it as two 4-byte >>> numbers and check that the high-order one (which is endianness >>> dependent, of course) is zero. A C-level sanity check seems more >>> efficient and more helpful to me. >> >> Seems to me that the S-PLUS solution (output="double") would be a lot more >> useful. I'd commit that if you write it; I don't think I'd commit the >> warning. >> > > I was going to write some thing similar (idea = good, patch welcome ;)). My > only worry is that the "output" argument is a bit misleading in that one > could expect to use any combination of "input"/"output" which may be a > maintenance nightmare. If I understand it correctly it's only a special case > for integer input. I don't have S+ so can't say how they deal with that. > > Cheers, > Simon > > >> >>> >>>>> Pretending that it's really only four bytes because of >>>>> the limits of R's integer type isn't all that helpful. Perhaps a >>>>> warning should be put out if the cast will affect the value of the >>>>> result? It looks like the relevant lines in src/main/connections.c are >>>>> 3689-3697 in the current alpha: >>>>> >>>>> #if SIZEOF_LONG == 8 >>>>> case sizeof(long): >>>>> INTEGER(ans)[i] = (int)*((long *)buf); >>>>> break; >>>>> #elif SIZEOF_LONG_LONG == 8 >>>>> case sizeof(_lli_t): >>>>> INTEGER(ans)[i] = (int)*((_lli_t *)buf); >>>>> break; >>>>> #endif >>>>> >>>>>>> ) The value can be represented as a double, >>>>>>> though: >>>>>>> >>>>>>>> 4294967296 >>>>>>> [1] 4294967296 >>>>>>> >>>>>>> I wouldn't expect readBin() to return a double if an integer was >>>>>>> requested, but is there any way to get the correct value out of it? >>>>>> >>>>>> Trivially (for your unsigned big-endian case): >>>>>> >>>>>> y<- readBin(x, "integer", n=length(x)/4L, endian="big") >>>>>> y<- ifelse(y< 0, 2^32 + y, y) >>>>>> i<- seq(1,length(y),2) >>>>>> y<- y[i] * 2^32 + y[i + 1L] >>>>> >>>>> Thanks for the code, but I'm not sure I would call that trivial, >>>>> especially if one needs to cater for little endian and signed cases as >>>>> well! >>>> >>>> I was saying for your case and it's trivial as in read as integers, >>>> convert to double precision and add. >>>> >>>> >>>>> This is what I meant by reconstructing the number manually... >>>>> >>>> >>>> You didn't say so - you were talking about reconstructing it from a raw >>>> vector which seems a lot more painful since you can't compute with enough >>>> precision on raw vectors. >>> >>> True - I should have been more specific. Sorry. >>> >>> Jon >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> > >
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel