On Wed, Mar 30, 2011 at 11:22 AM, Simon Urbanek <simon.urba...@r-project.org> wrote: > Bill, > > thanks. I like that idea of the output parameter better, especially if we > ever add different scalar vector types. Admittedly, what=integer() is the > most useful case. What I was worried about is things like what=double(), > output=integer() which could be legal, but are more conveniently dealt with > via as.integer(readBin()) instead.
What about this: Let the default be output=what. Then, just throw an error upon the function for non-supported combinations of 'what' and 'output'. Something like (assuming 'what' and 'output' already have been converted to "type" strings): # Validate argument 'output': if (output != what) { # In most cases, we never get here. also <- list(integer="double")[[what]]; if (is.null(also) || !is.element(output, also)) { # Throw an informative error message stop("Unsupported value of argument 'output' (\"", output, "\"). Supported output types when reading \"", what, "\" values: ", paste(c(what, also), collapse=", ")); } } That should prevent any unintended usage (before wasting time with I/O). It is also allows for future extension. Thxs /Henrik > I won't have more time today, but I'll have a look tomorrow. > > Thanks, > Simon > > > On Mar 30, 2011, at 1:38 PM, William Dunlap wrote: > >> >>> -----Original Message----- >>> From: r-devel-boun...@r-project.org >>> [mailto:r-devel-boun...@r-project.org] On Behalf Of Simon Urbanek >>> Sent: Tuesday, March 29, 2011 6:49 PM >>> To: Duncan Murdoch >>> Cc: r-devel@r-project.org >>> Subject: Re: [Rd] Reading 64-bit integers >>> >>> >>> On Mar 29, 2011, at 8:47 PM, Duncan Murdoch wrote: >>> >>>> On 29/03/2011 7:01 PM, Jon Clayden wrote: >>>>> Dear Simon, >>>>> >>>>> On 29 March 2011 22:40, Simon >>> Urbanek<simon.urba...@r-project.org> wrote: >>>>>> Jon, >>>>>> >>>>>> On Mar 29, 2011, at 1:33 PM, Jon Clayden wrote: >>>>>> >>>>>>> Dear Simon, >>>>>>> >>>>>>> Thank you for the response. >>>>>>> >>>>>>> On 29 March 2011 15:06, Simon >>> Urbanek<simon.urba...@r-project.org> wrote: >>>>>>>> >>>>>>>> On Mar 29, 2011, at 8:46 AM, Jon Clayden wrote: >>>>>>>> >>>>>>>>> Dear all, >>>>>>>>> >>>>>>>>> I see from some previous threads that support for >>> 64-bit integers in R >>>>>>>>> may be an aim for future versions, but in the meantime >>> I'm wondering >>>>>>>>> whether it is possible to read in integers of greater >>> than 32 bits at >>>>>>>>> all. Judging from ?readBin, it should be possible to >>> read 8-byte >>>>>>>>> integers to some degree, but it is clearly limited in >>> practice by R's >>>>>>>>> internally 32-bit integer type: >>>>>>>>> >>>>>>>>>> x<- as.raw(c(0,0,0,0,1,0,0,0)) >>>>>>>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big")) >>>>>>>>> [1] 16777216 >>>>>>>>>> x<- as.raw(c(0,0,0,1,0,0,0,0)) >>>>>>>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big")) >>>>>>>>> [1] 0 >>>>>>>>> >>>>>>>>> For values that fit into 32 bits it works fine, but >>> for larger values >>>>>>>>> it fails. (I'm a bit surprised by the zero - should >>> the value not be >>>>>>>>> NA if it is out of range? >>>>>>>> >>>>>>>> No, it's not out of range - int is only 4 bytes so only >>> 4 first bytes (respecting endianness order, hence LSB) are used. >>>>>>> >>>>>>> The fact remains that I ask for the value of an 8-byte >>> integer and >>>>>>> don't get it. >>>>>> >>>>>> I think you're misinterpreting the documentation: >>>>>> >>>>>> If 'size' is specified and not the natural size of the object, >>>>>> each element of the vector is coerced to an appropriate type >>>>>> before being written or as it is read. >>>>>> >>>>>> The "integer" object type is defined as signed 32-bit in >>> R, so if you ask for "8 bytes into object type integer", you >>> get a coercion into that object type -- 32-bit signed integer >>> -- as documented. I think the issue may come from the >>> confusion of the object type "integer" with general "integer >>> number" in mathematical sense that has no representation >>> restrictions. (FWIW in C the "integer" type is "int" and it >>> is 32-bit on all modern OSes regardless of platform - that's >>> where the limitation comes from, it's not something R has made up). >>>>> >>>>> OK, but it still seems like there is a case for raising a >>> warning. As >>>>> it is there is no way to tell when reading an 8-byte integer from a >>>>> file whether its value is really 0, or if it merely has 0 in its >>>>> least-significant 4 bytes. If 99% of such stored numbers are below >>>>> 2^31, one is going to need some extra logic to catch the other 1% >>>>> where you (silently) get the wrong value. In essence, unless you're >>>>> certain that you will never come across a number that actually uses >>>>> the upper 4 bytes, you will always have to read it as two 4-byte >>>>> numbers and check that the high-order one (which is endianness >>>>> dependent, of course) is zero. A C-level sanity check seems more >>>>> efficient and more helpful to me. >>>> >>>> Seems to me that the S-PLUS solution (output="double") >>> would be a lot more useful. I'd commit that if you write it; >>> I don't think I'd commit the warning. >>>> >>> >>> I was going to write some thing similar (idea = good, patch >>> welcome ;)). My only worry is that the "output" argument is a >>> bit misleading in that one could expect to use any >>> combination of "input"/"output" which may be a maintenance >>> nightmare. If I understand it correctly it's only a special >>> case for integer input. I don't have S+ so can't say how they >>> deal with that. >> >> In S+'s readBin the output argument can be >> only double() or single() when what is double() >> or single() (S+ still has a real single >> precision storage mode) and can be any >> numeric type or logical when what is integer(). >> >> The output=double() seemed like the only useful case. >> >> It does not warn when precision is lost in the 8-byte >> integer to double conversion. Perhaps it should. >> >> Bill Dunlap >> Spotfire, TIBCO Software >> wdunlap tibco.com >> >>> >>> Cheers, >>> Simon >>> >>> >>>> >>>>> >>>>>>> Pretending that it's really only four bytes because of >>>>>>> the limits of R's integer type isn't all that helpful. Perhaps a >>>>>>> warning should be put out if the cast will affect the >>> value of the >>>>>>> result? It looks like the relevant lines in >>> src/main/connections.c are >>>>>>> 3689-3697 in the current alpha: >>>>>>> >>>>>>> #if SIZEOF_LONG == 8 >>>>>>> case sizeof(long): >>>>>>> INTEGER(ans)[i] = (int)*((long *)buf); >>>>>>> break; >>>>>>> #elif SIZEOF_LONG_LONG == 8 >>>>>>> case sizeof(_lli_t): >>>>>>> INTEGER(ans)[i] = (int)*((_lli_t *)buf); >>>>>>> break; >>>>>>> #endif >>>>>>> >>>>>>>>> ) The value can be represented as a double, >>>>>>>>> though: >>>>>>>>> >>>>>>>>>> 4294967296 >>>>>>>>> [1] 4294967296 >>>>>>>>> >>>>>>>>> I wouldn't expect readBin() to return a double if an >>> integer was >>>>>>>>> requested, but is there any way to get the correct >>> value out of it? >>>>>>>> >>>>>>>> Trivially (for your unsigned big-endian case): >>>>>>>> >>>>>>>> y<- readBin(x, "integer", n=length(x)/4L, endian="big") >>>>>>>> y<- ifelse(y< 0, 2^32 + y, y) >>>>>>>> i<- seq(1,length(y),2) >>>>>>>> y<- y[i] * 2^32 + y[i + 1L] >>>>>>> >>>>>>> Thanks for the code, but I'm not sure I would call that trivial, >>>>>>> especially if one needs to cater for little endian and >>> signed cases as >>>>>>> well! >>>>>> >>>>>> I was saying for your case and it's trivial as in read as >>> integers, convert to double precision and add. >>>>>> >>>>>> >>>>>>> This is what I meant by reconstructing the number manually... >>>>>>> >>>>>> >>>>>> You didn't say so - you were talking about reconstructing >>> it from a raw vector which seems a lot more painful since you >>> can't compute with enough precision on raw vectors. >>>>> >>>>> True - I should have been more specific. Sorry. >>>>> >>>>> Jon >>>>> >>>>> ______________________________________________ >>>>> R-devel@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>> >>>> >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> >> > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel