> -----Original Message----- > From: r-devel-boun...@r-project.org > [mailto:r-devel-boun...@r-project.org] On Behalf Of Simon Urbanek > Sent: Tuesday, March 29, 2011 6:49 PM > To: Duncan Murdoch > Cc: r-devel@r-project.org > Subject: Re: [Rd] Reading 64-bit integers > > > On Mar 29, 2011, at 8:47 PM, Duncan Murdoch wrote: > > > On 29/03/2011 7:01 PM, Jon Clayden wrote: > >> Dear Simon, > >> > >> On 29 March 2011 22:40, Simon > Urbanek<simon.urba...@r-project.org> wrote: > >>> Jon, > >>> > >>> On Mar 29, 2011, at 1:33 PM, Jon Clayden wrote: > >>> > >>>> Dear Simon, > >>>> > >>>> Thank you for the response. > >>>> > >>>> On 29 March 2011 15:06, Simon > Urbanek<simon.urba...@r-project.org> wrote: > >>>>> > >>>>> On Mar 29, 2011, at 8:46 AM, Jon Clayden wrote: > >>>>> > >>>>>> Dear all, > >>>>>> > >>>>>> I see from some previous threads that support for > 64-bit integers in R > >>>>>> may be an aim for future versions, but in the meantime > I'm wondering > >>>>>> whether it is possible to read in integers of greater > than 32 bits at > >>>>>> all. Judging from ?readBin, it should be possible to > read 8-byte > >>>>>> integers to some degree, but it is clearly limited in > practice by R's > >>>>>> internally 32-bit integer type: > >>>>>> > >>>>>>> x<- as.raw(c(0,0,0,0,1,0,0,0)) > >>>>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big")) > >>>>>> [1] 16777216 > >>>>>>> x<- as.raw(c(0,0,0,1,0,0,0,0)) > >>>>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big")) > >>>>>> [1] 0 > >>>>>> > >>>>>> For values that fit into 32 bits it works fine, but > for larger values > >>>>>> it fails. (I'm a bit surprised by the zero - should > the value not be > >>>>>> NA if it is out of range? > >>>>> > >>>>> No, it's not out of range - int is only 4 bytes so only > 4 first bytes (respecting endianness order, hence LSB) are used. > >>>> > >>>> The fact remains that I ask for the value of an 8-byte > integer and > >>>> don't get it. > >>> > >>> I think you're misinterpreting the documentation: > >>> > >>> If 'size' is specified and not the natural size of the object, > >>> each element of the vector is coerced to an appropriate type > >>> before being written or as it is read. > >>> > >>> The "integer" object type is defined as signed 32-bit in > R, so if you ask for "8 bytes into object type integer", you > get a coercion into that object type -- 32-bit signed integer > -- as documented. I think the issue may come from the > confusion of the object type "integer" with general "integer > number" in mathematical sense that has no representation > restrictions. (FWIW in C the "integer" type is "int" and it > is 32-bit on all modern OSes regardless of platform - that's > where the limitation comes from, it's not something R has made up). > >> > >> OK, but it still seems like there is a case for raising a > warning. As > >> it is there is no way to tell when reading an 8-byte integer from a > >> file whether its value is really 0, or if it merely has 0 in its > >> least-significant 4 bytes. If 99% of such stored numbers are below > >> 2^31, one is going to need some extra logic to catch the other 1% > >> where you (silently) get the wrong value. In essence, unless you're > >> certain that you will never come across a number that actually uses > >> the upper 4 bytes, you will always have to read it as two 4-byte > >> numbers and check that the high-order one (which is endianness > >> dependent, of course) is zero. A C-level sanity check seems more > >> efficient and more helpful to me. > > > > Seems to me that the S-PLUS solution (output="double") > would be a lot more useful. I'd commit that if you write it; > I don't think I'd commit the warning. > > > > I was going to write some thing similar (idea = good, patch > welcome ;)). My only worry is that the "output" argument is a > bit misleading in that one could expect to use any > combination of "input"/"output" which may be a maintenance > nightmare. If I understand it correctly it's only a special > case for integer input. I don't have S+ so can't say how they > deal with that.
In S+'s readBin the output argument can be only double() or single() when what is double() or single() (S+ still has a real single precision storage mode) and can be any numeric type or logical when what is integer(). The output=double() seemed like the only useful case. It does not warn when precision is lost in the 8-byte integer to double conversion. Perhaps it should. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > Cheers, > Simon > > > > > >> > >>>> Pretending that it's really only four bytes because of > >>>> the limits of R's integer type isn't all that helpful. Perhaps a > >>>> warning should be put out if the cast will affect the > value of the > >>>> result? It looks like the relevant lines in > src/main/connections.c are > >>>> 3689-3697 in the current alpha: > >>>> > >>>> #if SIZEOF_LONG == 8 > >>>> case sizeof(long): > >>>> INTEGER(ans)[i] = (int)*((long *)buf); > >>>> break; > >>>> #elif SIZEOF_LONG_LONG == 8 > >>>> case sizeof(_lli_t): > >>>> INTEGER(ans)[i] = (int)*((_lli_t *)buf); > >>>> break; > >>>> #endif > >>>> > >>>>>> ) The value can be represented as a double, > >>>>>> though: > >>>>>> > >>>>>>> 4294967296 > >>>>>> [1] 4294967296 > >>>>>> > >>>>>> I wouldn't expect readBin() to return a double if an > integer was > >>>>>> requested, but is there any way to get the correct > value out of it? > >>>>> > >>>>> Trivially (for your unsigned big-endian case): > >>>>> > >>>>> y<- readBin(x, "integer", n=length(x)/4L, endian="big") > >>>>> y<- ifelse(y< 0, 2^32 + y, y) > >>>>> i<- seq(1,length(y),2) > >>>>> y<- y[i] * 2^32 + y[i + 1L] > >>>> > >>>> Thanks for the code, but I'm not sure I would call that trivial, > >>>> especially if one needs to cater for little endian and > signed cases as > >>>> well! > >>> > >>> I was saying for your case and it's trivial as in read as > integers, convert to double precision and add. > >>> > >>> > >>>> This is what I meant by reconstructing the number manually... > >>>> > >>> > >>> You didn't say so - you were talking about reconstructing > it from a raw vector which seems a lot more painful since you > can't compute with enough precision on raw vectors. > >> > >> True - I should have been more specific. Sorry. > >> > >> Jon > >> > >> ______________________________________________ > >> R-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel