Bill, thanks. I like that idea of the output parameter better, especially if we ever add different scalar vector types. Admittedly, what=integer() is the most useful case. What I was worried about is things like what=double(), output=integer() which could be legal, but are more conveniently dealt with via as.integer(readBin()) instead. I won't have more time today, but I'll have a look tomorrow.
Thanks, Simon On Mar 30, 2011, at 1:38 PM, William Dunlap wrote: > >> -----Original Message----- >> From: r-devel-boun...@r-project.org >> [mailto:r-devel-boun...@r-project.org] On Behalf Of Simon Urbanek >> Sent: Tuesday, March 29, 2011 6:49 PM >> To: Duncan Murdoch >> Cc: r-devel@r-project.org >> Subject: Re: [Rd] Reading 64-bit integers >> >> >> On Mar 29, 2011, at 8:47 PM, Duncan Murdoch wrote: >> >>> On 29/03/2011 7:01 PM, Jon Clayden wrote: >>>> Dear Simon, >>>> >>>> On 29 March 2011 22:40, Simon >> Urbanek<simon.urba...@r-project.org> wrote: >>>>> Jon, >>>>> >>>>> On Mar 29, 2011, at 1:33 PM, Jon Clayden wrote: >>>>> >>>>>> Dear Simon, >>>>>> >>>>>> Thank you for the response. >>>>>> >>>>>> On 29 March 2011 15:06, Simon >> Urbanek<simon.urba...@r-project.org> wrote: >>>>>>> >>>>>>> On Mar 29, 2011, at 8:46 AM, Jon Clayden wrote: >>>>>>> >>>>>>>> Dear all, >>>>>>>> >>>>>>>> I see from some previous threads that support for >> 64-bit integers in R >>>>>>>> may be an aim for future versions, but in the meantime >> I'm wondering >>>>>>>> whether it is possible to read in integers of greater >> than 32 bits at >>>>>>>> all. Judging from ?readBin, it should be possible to >> read 8-byte >>>>>>>> integers to some degree, but it is clearly limited in >> practice by R's >>>>>>>> internally 32-bit integer type: >>>>>>>> >>>>>>>>> x<- as.raw(c(0,0,0,0,1,0,0,0)) >>>>>>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big")) >>>>>>>> [1] 16777216 >>>>>>>>> x<- as.raw(c(0,0,0,1,0,0,0,0)) >>>>>>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big")) >>>>>>>> [1] 0 >>>>>>>> >>>>>>>> For values that fit into 32 bits it works fine, but >> for larger values >>>>>>>> it fails. (I'm a bit surprised by the zero - should >> the value not be >>>>>>>> NA if it is out of range? >>>>>>> >>>>>>> No, it's not out of range - int is only 4 bytes so only >> 4 first bytes (respecting endianness order, hence LSB) are used. >>>>>> >>>>>> The fact remains that I ask for the value of an 8-byte >> integer and >>>>>> don't get it. >>>>> >>>>> I think you're misinterpreting the documentation: >>>>> >>>>> If 'size' is specified and not the natural size of the object, >>>>> each element of the vector is coerced to an appropriate type >>>>> before being written or as it is read. >>>>> >>>>> The "integer" object type is defined as signed 32-bit in >> R, so if you ask for "8 bytes into object type integer", you >> get a coercion into that object type -- 32-bit signed integer >> -- as documented. I think the issue may come from the >> confusion of the object type "integer" with general "integer >> number" in mathematical sense that has no representation >> restrictions. (FWIW in C the "integer" type is "int" and it >> is 32-bit on all modern OSes regardless of platform - that's >> where the limitation comes from, it's not something R has made up). >>>> >>>> OK, but it still seems like there is a case for raising a >> warning. As >>>> it is there is no way to tell when reading an 8-byte integer from a >>>> file whether its value is really 0, or if it merely has 0 in its >>>> least-significant 4 bytes. If 99% of such stored numbers are below >>>> 2^31, one is going to need some extra logic to catch the other 1% >>>> where you (silently) get the wrong value. In essence, unless you're >>>> certain that you will never come across a number that actually uses >>>> the upper 4 bytes, you will always have to read it as two 4-byte >>>> numbers and check that the high-order one (which is endianness >>>> dependent, of course) is zero. A C-level sanity check seems more >>>> efficient and more helpful to me. >>> >>> Seems to me that the S-PLUS solution (output="double") >> would be a lot more useful. I'd commit that if you write it; >> I don't think I'd commit the warning. >>> >> >> I was going to write some thing similar (idea = good, patch >> welcome ;)). My only worry is that the "output" argument is a >> bit misleading in that one could expect to use any >> combination of "input"/"output" which may be a maintenance >> nightmare. If I understand it correctly it's only a special >> case for integer input. I don't have S+ so can't say how they >> deal with that. > > In S+'s readBin the output argument can be > only double() or single() when what is double() > or single() (S+ still has a real single > precision storage mode) and can be any > numeric type or logical when what is integer(). > > The output=double() seemed like the only useful case. > > It does not warn when precision is lost in the 8-byte > integer to double conversion. Perhaps it should. > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > >> >> Cheers, >> Simon >> >> >>> >>>> >>>>>> Pretending that it's really only four bytes because of >>>>>> the limits of R's integer type isn't all that helpful. Perhaps a >>>>>> warning should be put out if the cast will affect the >> value of the >>>>>> result? It looks like the relevant lines in >> src/main/connections.c are >>>>>> 3689-3697 in the current alpha: >>>>>> >>>>>> #if SIZEOF_LONG == 8 >>>>>> case sizeof(long): >>>>>> INTEGER(ans)[i] = (int)*((long *)buf); >>>>>> break; >>>>>> #elif SIZEOF_LONG_LONG == 8 >>>>>> case sizeof(_lli_t): >>>>>> INTEGER(ans)[i] = (int)*((_lli_t *)buf); >>>>>> break; >>>>>> #endif >>>>>> >>>>>>>> ) The value can be represented as a double, >>>>>>>> though: >>>>>>>> >>>>>>>>> 4294967296 >>>>>>>> [1] 4294967296 >>>>>>>> >>>>>>>> I wouldn't expect readBin() to return a double if an >> integer was >>>>>>>> requested, but is there any way to get the correct >> value out of it? >>>>>>> >>>>>>> Trivially (for your unsigned big-endian case): >>>>>>> >>>>>>> y<- readBin(x, "integer", n=length(x)/4L, endian="big") >>>>>>> y<- ifelse(y< 0, 2^32 + y, y) >>>>>>> i<- seq(1,length(y),2) >>>>>>> y<- y[i] * 2^32 + y[i + 1L] >>>>>> >>>>>> Thanks for the code, but I'm not sure I would call that trivial, >>>>>> especially if one needs to cater for little endian and >> signed cases as >>>>>> well! >>>>> >>>>> I was saying for your case and it's trivial as in read as >> integers, convert to double precision and add. >>>>> >>>>> >>>>>> This is what I meant by reconstructing the number manually... >>>>>> >>>>> >>>>> You didn't say so - you were talking about reconstructing >> it from a raw vector which seems a lot more painful since you >> can't compute with enough precision on raw vectors. >>>> >>>> True - I should have been more specific. Sorry. >>>> >>>> Jon >>>> >>>> ______________________________________________ >>>> R-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >>> >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel