On 8/19/2009 1:49 PM, miller_2555 wrote:

Roger Bivand wrote:

On Tue, 5 Dec 2006, Yoni Schamroth wrote:

Hi,

I am attempting to query a data frame from a mysql database.
One of the variables is a unique identification number ("numeric") 18
digits
long.
I am struggling to retrieve this variable exactly without any rounding.

Read it as a character - a double is a double:

x <- 6527600583317876352
y <- 6527600583317876380
all.equal(x,y)
[1] TRUE
storage.mode(x)
[1] "double"

and why they are equal is a FAQ (only ~16 digits in a double). Integer is
4-byte. Since they are IDs, not to be used for math, leave them as
character strings - which they are, like telephone numbers.


Resurrecting this post for a moment, the same issue arose when interfacing R
with a Postgres database using the bigint data type (a signed 64-bit integer
ranging from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 as of
this writing). While the underlying cause is summarized above, I'd like to
recommend the inclusion of a 64-bit integer data type into the R base. For
performance reasons, I use R to independently generate a unique transaction
ID that is equivalent to the Postgres-generated bigint (with some
consistency checks --  generally bad design, but vastly quicker than
querying the database for the same value). I currently generate a string
representation and pass that to the DBI, though the process is cumbersome
and likely not as efficient as an arithmetic equivalent (particularly when
using a 64-bit processor architecture). Furthermore, there are additional
gyrations that need to occur when querying the database for bigint values.
Do significant practical challenges exist in the implementation of a 64-bit
integer that would outweigh the faster and cleaner compatibility with
database backends?

I believe the C99 standard doesn't require that a 64 bit signed integer type exist (only one that is 64 bits or more), so that would likely cause some headaches. And we may still use some compilers that are not C99 compliant, which may not have any type that big.

But an even bigger problem is that there is a lot of type-specific code in R. Adding another primitive type like a 64 bit signed integer would mean writing arithmetic routines for that type and deciding how it interacts with all the other numeric types. For example: what if you add a floating point double to a 64 bit int? Normally adding a double to an int coerces the result to double. But double isn't big enough to hold a 64 bit int exactly. So doing something like x + 1 could lose precision in x.

So I imagine this will happen eventually, but it will not be easy, and it probably won't happen soon.

Duncan Murdoch

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to