Re: [Rd] How to handle INT8 data

2017-01-20 Thread Kasper Daniel Hansen
Have you benchmarked these potential drawbacks for your usecase? Eg. memory depends on the structure of the identifies, given how R stores characters internally. Given all the issues raised here, I would 100% provide a script for reading the data into R, if this is for distribution. Best, Kasper

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Dirk Eddelbuettel
Not sure how we got from int8 to int64 ... but for what it is worth, I recently a) needed 64-bit integers to represent nanosecond timestamps (which then became the still new-ish CRAN package 'nanotime') and b) found the support in package bit64 for its bit64::integer64 to be easy too use and perfo

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Willem Ligtenberg
You might want to use a data.table then. It will automatically detect that it is a 64 bit int. Although also in that case the user will have to install the data.table package. (which is a good idea anyway in my opinion :) ) It will then obviously allow you to join tables. Willem On 20-01-17 18:4

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Gabriel Becker
I, again, can't speak for R-core so I may be wrong about any of this and they are welcome to correct me but it seems unlikely that they would integrate a package that defines 64 bit integers in R into the core of R without making the changes necessary to provide 64 bit integers as a fundamental (a

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Peter Haverty
For what it is worth, I would be extremely pleased to R's integer type go to 64bit. A signed 32bit integer is just a bit too small to index into the ~3 billion position human genome. The "work arounds" that have arisen for this specific issue are surprisingly complex. Pete

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Nicolas Paris
Hi, I do have < INT_MAX. This looks attractive but since they are unique identifiers, storing them as factor will be likely to be counter-productive. (a string version + an int32 for each) I was looking to https://cran.r-project.org/web/packages/csvread/index.html This looks like a good feet for

Re: [Rd] xtabs(), factors and NAs

2017-01-20 Thread Martin Maechler
> Milan Bouchet-Valat > on Thu, 19 Jan 2017 13:58:31 +0100 writes: > Hi all, > I know this issue has been discussed a few times in the past already, > but Martin Maechler suggested in a bug report [1] that I raise it here. > > Basically, there is no (easy) way of printing NAs for all

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Gabriel Becker
How many unique idenfiiers do you have? If they are large (in terms of bytes) but you don't have that many of them (eg the total possible number you'll ever have is < INT_MAX), you could store them as factors. You get the speed of integers but the labeling of full "precision" strings. Factors are

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Nicolas Paris
Well I definitely cannot use them as numeric because join is the main reason of those identifiers. About int64 and bit64 packages, it's not a solution, because I am releasing a dataset for external users. I cannot ask them to install a package in order to exploit them. I have to be very carefull

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Murray Stokely
2^53 == 2^53+1 TRUE Which makes joining or grouping data sets with 64 bit identifiers problematic. Murray (mobile) On Jan 20, 2017 9:15 AM, "Nicolas Paris" wrote: Le 20 janv. 2017 à 18h09, Murray Stokely écrivait : > The lack of 64 bit integer support causes lots of problems when dealing with

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Nicolas Paris
Right, they are identifiers. Storing them as String has drawbacks: - huge to store in memory - slow to process - huge to index (by eg data.table columns indexes) Why not storing them as numeric ? Thanks, Le 20 janv. 2017 à 18h16, William Dunlap écrivait : > If these are identifiers, store them

Re: [Rd] How to handle INT8 data

2017-01-20 Thread William Dunlap via R-devel
If these are identifiers, store them as strings. If not, what sort of calculations do you plan on doing with them? Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Jan 20, 2017 at 6:33 AM, Nicolas Paris wrote: > Hello r users, > > I have to deal with int8 data with R. AFAIK R does only han

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Nicolas Paris
Le 20 janv. 2017 à 18h09, Murray Stokely écrivait : > The lack of 64 bit integer support causes lots of problems when dealing with > certain types of data where the loss of precision from coercing to 53 bits > with > double is unacceptable. Hello Murray, Do you mean, by eg. -1311071933951566764 l

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Murray Stokely
The lack of 64 bit integer support causes lots of problems when dealing with certain types of data where the loss of precision from coercing to 53 bits with double is unacceptable. Two packages were developed to deal with this: int64 and bit64. You may need to find archival versions of these pac

Re: [Rd] How to handle INT8 data

2017-01-20 Thread Gabriel Becker
I am not on R-core, so cannot speak to future plans to internally support int8 (though my impression is that there aren't any, at least none that are close to fruition). The standard way of dealing with whole numbers too big to fit in an integer is to put them in a numeric (double down in C land).

[Rd] How to handle INT8 data

2017-01-20 Thread Nicolas Paris
Hello r users, I have to deal with int8 data with R. AFAIK R does only handle int4 with `as.integer` function [1]. I wonder: 1. what is the better approach to handle int8 ? `as.character` ? `as.numeric` ? 2. is there any plan to handle int8 in the future ? As you might know, int4 is to small to d

[Rd] NaN behavior of cumsum

2017-01-20 Thread Lukas Stadler
Hi! I noticed that cumsum behaves different than the other cumulative functions wrt. NaN values: > values <- c(1,2,NaN,1) > for ( f in c(cumsum, cumprod, cummin, cummax)) print(f(values)) [1] 1 3 NA NA [1] 1 2 NaN NaN [1] 1 1 NaN NaN [1] 1 2 NaN NaN The reason is that cumsum (in cu