On Wed, Mar 30, 2011 at 7:51 PM, Henrik Bengtsson <h...@biostat.ucsf.edu> wrote: > On Wed, Mar 30, 2011 at 11:22 AM, Simon Urbanek > <simon.urba...@r-project.org> wrote: >> Bill, >> >> thanks. I like that idea of the output parameter better, especially if we >> ever add different scalar vector types. Admittedly, what=integer() is the >> most useful case. What I was worried about is things like what=double(), >> output=integer() which could be legal, but are more conveniently dealt with >> via as.integer(readBin()) instead. > > What about this: > > Let the default be output=what. Then, just throw an error upon the > function for non-supported combinations of 'what' and 'output'. > Something like (assuming 'what' and 'output' already have been > converted to "type" strings): > > # Validate argument 'output': > if (output != what) { > # In most cases, we never get here. > also <- list(integer="double")[[what]]; > if (is.null(also) || !is.element(output, also)) {
if (!is.element(output, also)) { should be enough. /H > # Throw an informative error message > stop("Unsupported value of argument 'output' (\"", output, "\"). > Supported output types when reading \"", what, "\" values: ", > paste(c(what, also), collapse=", ")); > } > } > > That should prevent any unintended usage (before wasting time with > I/O). It is also allows for future extension. > > Thxs > > /Henrik > >> I won't have more time today, but I'll have a look tomorrow. >> >> Thanks, >> Simon >> >> >> On Mar 30, 2011, at 1:38 PM, William Dunlap wrote: >> >>> >>>> -----Original Message----- >>>> From: r-devel-boun...@r-project.org >>>> [mailto:r-devel-boun...@r-project.org] On Behalf Of Simon Urbanek >>>> Sent: Tuesday, March 29, 2011 6:49 PM >>>> To: Duncan Murdoch >>>> Cc: r-devel@r-project.org >>>> Subject: Re: [Rd] Reading 64-bit integers >>>> >>>> >>>> On Mar 29, 2011, at 8:47 PM, Duncan Murdoch wrote: >>>> >>>>> On 29/03/2011 7:01 PM, Jon Clayden wrote: >>>>>> Dear Simon, >>>>>> >>>>>> On 29 March 2011 22:40, Simon >>>> Urbanek<simon.urba...@r-project.org> wrote: >>>>>>> Jon, >>>>>>> >>>>>>> On Mar 29, 2011, at 1:33 PM, Jon Clayden wrote: >>>>>>> >>>>>>>> Dear Simon, >>>>>>>> >>>>>>>> Thank you for the response. >>>>>>>> >>>>>>>> On 29 March 2011 15:06, Simon >>>> Urbanek<simon.urba...@r-project.org> wrote: >>>>>>>>> >>>>>>>>> On Mar 29, 2011, at 8:46 AM, Jon Clayden wrote: >>>>>>>>> >>>>>>>>>> Dear all, >>>>>>>>>> >>>>>>>>>> I see from some previous threads that support for >>>> 64-bit integers in R >>>>>>>>>> may be an aim for future versions, but in the meantime >>>> I'm wondering >>>>>>>>>> whether it is possible to read in integers of greater >>>> than 32 bits at >>>>>>>>>> all. Judging from ?readBin, it should be possible to >>>> read 8-byte >>>>>>>>>> integers to some degree, but it is clearly limited in >>>> practice by R's >>>>>>>>>> internally 32-bit integer type: >>>>>>>>>> >>>>>>>>>>> x<- as.raw(c(0,0,0,0,1,0,0,0)) >>>>>>>>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big")) >>>>>>>>>> [1] 16777216 >>>>>>>>>>> x<- as.raw(c(0,0,0,1,0,0,0,0)) >>>>>>>>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big")) >>>>>>>>>> [1] 0 >>>>>>>>>> >>>>>>>>>> For values that fit into 32 bits it works fine, but >>>> for larger values >>>>>>>>>> it fails. (I'm a bit surprised by the zero - should >>>> the value not be >>>>>>>>>> NA if it is out of range? >>>>>>>>> >>>>>>>>> No, it's not out of range - int is only 4 bytes so only >>>> 4 first bytes (respecting endianness order, hence LSB) are used. >>>>>>>> >>>>>>>> The fact remains that I ask for the value of an 8-byte >>>> integer and >>>>>>>> don't get it. >>>>>>> >>>>>>> I think you're misinterpreting the documentation: >>>>>>> >>>>>>> If 'size' is specified and not the natural size of the object, >>>>>>> each element of the vector is coerced to an appropriate type >>>>>>> before being written or as it is read. >>>>>>> >>>>>>> The "integer" object type is defined as signed 32-bit in >>>> R, so if you ask for "8 bytes into object type integer", you >>>> get a coercion into that object type -- 32-bit signed integer >>>> -- as documented. I think the issue may come from the >>>> confusion of the object type "integer" with general "integer >>>> number" in mathematical sense that has no representation >>>> restrictions. (FWIW in C the "integer" type is "int" and it >>>> is 32-bit on all modern OSes regardless of platform - that's >>>> where the limitation comes from, it's not something R has made up). >>>>>> >>>>>> OK, but it still seems like there is a case for raising a >>>> warning. As >>>>>> it is there is no way to tell when reading an 8-byte integer from a >>>>>> file whether its value is really 0, or if it merely has 0 in its >>>>>> least-significant 4 bytes. If 99% of such stored numbers are below >>>>>> 2^31, one is going to need some extra logic to catch the other 1% >>>>>> where you (silently) get the wrong value. In essence, unless you're >>>>>> certain that you will never come across a number that actually uses >>>>>> the upper 4 bytes, you will always have to read it as two 4-byte >>>>>> numbers and check that the high-order one (which is endianness >>>>>> dependent, of course) is zero. A C-level sanity check seems more >>>>>> efficient and more helpful to me. >>>>> >>>>> Seems to me that the S-PLUS solution (output="double") >>>> would be a lot more useful. I'd commit that if you write it; >>>> I don't think I'd commit the warning. >>>>> >>>> >>>> I was going to write some thing similar (idea = good, patch >>>> welcome ;)). My only worry is that the "output" argument is a >>>> bit misleading in that one could expect to use any >>>> combination of "input"/"output" which may be a maintenance >>>> nightmare. If I understand it correctly it's only a special >>>> case for integer input. I don't have S+ so can't say how they >>>> deal with that. >>> >>> In S+'s readBin the output argument can be >>> only double() or single() when what is double() >>> or single() (S+ still has a real single >>> precision storage mode) and can be any >>> numeric type or logical when what is integer(). >>> >>> The output=double() seemed like the only useful case. >>> >>> It does not warn when precision is lost in the 8-byte >>> integer to double conversion. Perhaps it should. >>> >>> Bill Dunlap >>> Spotfire, TIBCO Software >>> wdunlap tibco.com >>> >>>> >>>> Cheers, >>>> Simon >>>> >>>> >>>>> >>>>>> >>>>>>>> Pretending that it's really only four bytes because of >>>>>>>> the limits of R's integer type isn't all that helpful. Perhaps a >>>>>>>> warning should be put out if the cast will affect the >>>> value of the >>>>>>>> result? It looks like the relevant lines in >>>> src/main/connections.c are >>>>>>>> 3689-3697 in the current alpha: >>>>>>>> >>>>>>>> #if SIZEOF_LONG == 8 >>>>>>>> case sizeof(long): >>>>>>>> INTEGER(ans)[i] = (int)*((long *)buf); >>>>>>>> break; >>>>>>>> #elif SIZEOF_LONG_LONG == 8 >>>>>>>> case sizeof(_lli_t): >>>>>>>> INTEGER(ans)[i] = (int)*((_lli_t *)buf); >>>>>>>> break; >>>>>>>> #endif >>>>>>>> >>>>>>>>>> ) The value can be represented as a double, >>>>>>>>>> though: >>>>>>>>>> >>>>>>>>>>> 4294967296 >>>>>>>>>> [1] 4294967296 >>>>>>>>>> >>>>>>>>>> I wouldn't expect readBin() to return a double if an >>>> integer was >>>>>>>>>> requested, but is there any way to get the correct >>>> value out of it? >>>>>>>>> >>>>>>>>> Trivially (for your unsigned big-endian case): >>>>>>>>> >>>>>>>>> y<- readBin(x, "integer", n=length(x)/4L, endian="big") >>>>>>>>> y<- ifelse(y< 0, 2^32 + y, y) >>>>>>>>> i<- seq(1,length(y),2) >>>>>>>>> y<- y[i] * 2^32 + y[i + 1L] >>>>>>>> >>>>>>>> Thanks for the code, but I'm not sure I would call that trivial, >>>>>>>> especially if one needs to cater for little endian and >>>> signed cases as >>>>>>>> well! >>>>>>> >>>>>>> I was saying for your case and it's trivial as in read as >>>> integers, convert to double precision and add. >>>>>>> >>>>>>> >>>>>>>> This is what I meant by reconstructing the number manually... >>>>>>>> >>>>>>> >>>>>>> You didn't say so - you were talking about reconstructing >>>> it from a raw vector which seems a lot more painful since you >>>> can't compute with enough precision on raw vectors. >>>>>> >>>>>> True - I should have been more specific. Sorry. >>>>>> >>>>>> Jon >>>>>> >>>>>> ______________________________________________ >>>>>> R-devel@r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>>> >>>>> >>>> >>>> ______________________________________________ >>>> R-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>> >>> >>> >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel