Re: [Rd] How to handle INT8 data

2017-01-21 Thread Dirk Eddelbuettel
On 21 January 2017 at 10:56, Hadley Wickham wrote: | To summarise this thread, there are basically three ways of handling int64 in R: | | * coerce to character | * coerce to double | * store in double | | ## Coerce to character Serious performance loss. | ## Coerce to double Serious precisi

Re: [Rd] How to handle INT8 data

2017-01-21 Thread Hadley Wickham
To summarise this thread, there are basically three ways of handling int64 in R: * coerce to character * coerce to double * store in double There is no ideal solution, and each have pros and cons that I've attempted to summarise below. ## Coerce to character This is the easiest approach if the

Re: [Rd] How to handle INT8 data

2017-01-21 Thread Jeroen Ooms
On Fri, Jan 20, 2017 at 6:09 PM, Murray Stokely wrote: > The lack of 64 bit integer support causes lots of problems when dealing > with certain types of data where the loss of precision from coercing to 53 > bits with double is unacceptable. > > Two packages were developed to deal with this: int6

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Kasper Daniel Hansen
Have you benchmarked these potential drawbacks for your usecase? Eg. memory depends on the structure of the identifies, given how R stores characters internally. Given all the issues raised here, I would 100% provide a script for reading the data into R, if this is for distribution. Best, Kasper

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Dirk Eddelbuettel
Not sure how we got from int8 to int64 ... but for what it is worth, I recently a) needed 64-bit integers to represent nanosecond timestamps (which then became the still new-ish CRAN package 'nanotime') and b) found the support in package bit64 for its bit64::integer64 to be easy too use and perfo

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Willem Ligtenberg
You might want to use a data.table then. It will automatically detect that it is a 64 bit int. Although also in that case the user will have to install the data.table package. (which is a good idea anyway in my opinion :) ) It will then obviously allow you to join tables. Willem On 20-01-17 18:4

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Gabriel Becker
I, again, can't speak for R-core so I may be wrong about any of this and they are welcome to correct me but it seems unlikely that they would integrate a package that defines 64 bit integers in R into the core of R without making the changes necessary to provide 64 bit integers as a fundamental (a

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Peter Haverty
For what it is worth, I would be extremely pleased to R's integer type go to 64bit. A signed 32bit integer is just a bit too small to index into the ~3 billion position human genome. The "work arounds" that have arisen for this specific issue are surprisingly complex. Pete

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Nicolas Paris
Hi, I do have < INT_MAX. This looks attractive but since they are unique identifiers, storing them as factor will be likely to be counter-productive. (a string version + an int32 for each) I was looking to https://cran.r-project.org/web/packages/csvread/index.html This looks like a good feet for

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Gabriel Becker
How many unique idenfiiers do you have? If they are large (in terms of bytes) but you don't have that many of them (eg the total possible number you'll ever have is < INT_MAX), you could store them as factors. You get the speed of integers but the labeling of full "precision" strings. Factors are

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Nicolas Paris
Well I definitely cannot use them as numeric because join is the main reason of those identifiers. About int64 and bit64 packages, it's not a solution, because I am releasing a dataset for external users. I cannot ask them to install a package in order to exploit them. I have to be very carefull

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Murray Stokely
2^53 == 2^53+1 TRUE Which makes joining or grouping data sets with 64 bit identifiers problematic. Murray (mobile) On Jan 20, 2017 9:15 AM, "Nicolas Paris" wrote: Le 20 janv. 2017 à 18h09, Murray Stokely écrivait : > The lack of 64 bit integer support causes lots of problems when dealing with

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Nicolas Paris
Right, they are identifiers. Storing them as String has drawbacks: - huge to store in memory - slow to process - huge to index (by eg data.table columns indexes) Why not storing them as numeric ? Thanks, Le 20 janv. 2017 à 18h16, William Dunlap écrivait : > If these are identifiers, store them

Re: [Rd] How to handle INT8 data

2017-01-20 Thread William Dunlap via R-devel
If these are identifiers, store them as strings. If not, what sort of calculations do you plan on doing with them? Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Jan 20, 2017 at 6:33 AM, Nicolas Paris wrote: > Hello r users, > > I have to deal with int8 data with R. AFAIK R does only han

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Nicolas Paris
Le 20 janv. 2017 à 18h09, Murray Stokely écrivait : > The lack of 64 bit integer support causes lots of problems when dealing with > certain types of data where the loss of precision from coercing to 53 bits > with > double is unacceptable. Hello Murray, Do you mean, by eg. -1311071933951566764 l

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Murray Stokely
The lack of 64 bit integer support causes lots of problems when dealing with certain types of data where the loss of precision from coercing to 53 bits with double is unacceptable. Two packages were developed to deal with this: int64 and bit64. You may need to find archival versions of these pac

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Gabriel Becker
I am not on R-core, so cannot speak to future plans to internally support int8 (though my impression is that there aren't any, at least none that are close to fruition). The standard way of dealing with whole numbers too big to fit in an integer is to put them in a numeric (double down in C land).

[Rd] How to handle INT8 data

2017-01-20 Thread Nicolas Paris
Hello r users, I have to deal with int8 data with R. AFAIK R does only handle int4 with `as.integer` function [1]. I wonder: 1. what is the better approach to handle int8 ? `as.character` ? `as.numeric` ? 2. is there any plan to handle int8 in the future ? As you might know, int4 is to small to d