To summarise this thread, there are basically three ways of handling int64 in R:
* coerce to character * coerce to double * store in double There is no ideal solution, and each have pros and cons that I've attempted to summarise below. ## Coerce to character This is the easiest approach if the data is used as identifiers. It will have some performance drawbacks when loading and will require additional memory. It should not have negative performance implications once the data has been loaded because R has a global string pool so string comparisons only require a single pointer comparison (assuming they have the same encoding) ## Coerce to double This is the easiest approach if your integers are in the range [-(2^53), 2^53] or you can tolerate some minor loss of precision. ## Store in a double This technique takes advantage of the fact that doubles and int64s are the same size, so you can store the binary representation of an int64 in a double. This will effectively be garbage if you treat the vector as if it is a double, so it requires adding an S3 class and overriding every generic function with a custom method. Not all functions are generic, and internal C code will not know about the special class, so this has the danger of code silently interpreting the data incorrectly. This is the approach taken by the bit64 package (and, I believe, the int64 package, but since that's been archived it's not worth considering. Hadley On Fri, Jan 20, 2017 at 9:19 AM, Gabriel Becker <gmbec...@ucdavis.edu> wrote: > I am not on R-core, so cannot speak to future plans to internally support > int8 (though my impression is that there aren't any, at least none that are > close to fruition). > > The standard way of dealing with whole numbers too big to fit in an integer > is to put them in a numeric (double down in C land). this can represent > integers up to 2^53 without loss of precision see ( > http://stackoverflow.com/questions/1848700/biggest-integer-that-can-be-stored-in-a-double). > This is how long vector indices are (currently) implemented in R. If it's > good enough for indices it's probably good enough for whatever you need > them for. > > Hope that helps. > > ~G > > > On Fri, Jan 20, 2017 at 6:33 AM, Nicolas Paris <nicolas.pa...@aphp.fr> > wrote: > >> Hello r users, >> >> I have to deal with int8 data with R. AFAIK R does only handle int4 >> with `as.integer` function [1]. I wonder: >> 1. what is the better approach to handle int8 ? `as.character` ? >> `as.numeric` ? >> 2. is there any plan to handle int8 in the future ? As you might know, >> int4 is to small to deal with earth population right now. >> >> Thanks for you ideas, >> >> int8 eg: >> >> human_id >> ---------------------- >> -1311071933951566764 >> -4708675461424073238 >> -6865005668390999818 >> 5578000650960353108 >> -3219674686933841021 >> -6469229889308771589 >> -606871692563545028 >> -8199987422425699249 >> -463287495999648233 >> 7675955260644241951 >> >> reference: >> 1. https://www.r-bloggers.com/r-in-a-64-bit-world/ >> >> -- >> Nicolas PARIS >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > > > -- > Gabriel Becker, PhD > Associate Scientist (Bioinformatics) > Genentech Research > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- http://hadley.nz ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel