Have you benchmarked these potential drawbacks for your usecase? Eg. memory
depends on the structure of the identifies, given how R stores characters
internally.
Given all the issues raised here, I would 100% provide a script for reading
the data into R, if this is for distribution.
Best,
Kasper
Not sure how we got from int8 to int64 ... but for what it is worth, I
recently a) needed 64-bit integers to represent nanosecond timestamps (which
then became the still new-ish CRAN package 'nanotime') and b) found the
support in package bit64 for its bit64::integer64 to be easy too use and
perfo
You might want to use a data.table then.
It will automatically detect that it is a 64 bit int.
Although also in that case the user will have to install the data.table
package.
(which is a good idea anyway in my opinion :) )
It will then obviously allow you to join tables.
Willem
On 20-01-17 18:4
I, again, can't speak for R-core so I may be wrong about any of this and
they are welcome to correct me but it seems unlikely that they would
integrate a package that defines 64 bit integers in R into the core of R
without making the changes necessary to provide 64 bit integers as a
fundamental (a
For what it is worth, I would be extremely pleased to R's integer type go
to 64bit. A signed 32bit integer is just a bit too small to index into the
~3 billion position human genome. The "work arounds" that have arisen for
this specific issue are surprisingly complex.
Pete
Hi,
I do have < INT_MAX.
This looks attractive but since they are unique identifiers, storing
them as factor will be likely to be counter-productive. (a string
version + an int32 for each)
I was looking to https://cran.r-project.org/web/packages/csvread/index.html
This looks like a good feet for
> Milan Bouchet-Valat
> on Thu, 19 Jan 2017 13:58:31 +0100 writes:
> Hi all,
> I know this issue has been discussed a few times in the past already,
> but Martin Maechler suggested in a bug report [1] that I raise it here.
>
> Basically, there is no (easy) way of printing NAs for all
How many unique idenfiiers do you have?
If they are large (in terms of bytes) but you don't have that many of them
(eg the total possible number you'll ever have is < INT_MAX), you could
store them as factors. You get the speed of integers but the labeling of
full "precision" strings. Factors are
Well I definitely cannot use them as numeric because join is the main
reason of those identifiers.
About int64 and bit64 packages, it's not a solution, because I am
releasing a dataset for external users. I cannot ask them to install a
package in order to exploit them.
I have to be very carefull
2^53 == 2^53+1
TRUE
Which makes joining or grouping data sets with 64 bit identifiers
problematic.
Murray (mobile)
On Jan 20, 2017 9:15 AM, "Nicolas Paris" wrote:
Le 20 janv. 2017 à 18h09, Murray Stokely écrivait :
> The lack of 64 bit integer support causes lots of problems when dealing
with
Right, they are identifiers.
Storing them as String has drawbacks:
- huge to store in memory
- slow to process
- huge to index (by eg data.table columns indexes)
Why not storing them as numeric ?
Thanks,
Le 20 janv. 2017 à 18h16, William Dunlap écrivait :
> If these are identifiers, store them
If these are identifiers, store them as strings. If not, what sort of
calculations do you plan on doing with them?
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Fri, Jan 20, 2017 at 6:33 AM, Nicolas Paris wrote:
> Hello r users,
>
> I have to deal with int8 data with R. AFAIK R does only han
Le 20 janv. 2017 à 18h09, Murray Stokely écrivait :
> The lack of 64 bit integer support causes lots of problems when dealing with
> certain types of data where the loss of precision from coercing to 53 bits
> with
> double is unacceptable.
Hello Murray,
Do you mean, by eg. -1311071933951566764 l
The lack of 64 bit integer support causes lots of problems when dealing
with certain types of data where the loss of precision from coercing to 53
bits with double is unacceptable.
Two packages were developed to deal with this: int64 and bit64.
You may need to find archival versions of these pac
I am not on R-core, so cannot speak to future plans to internally support
int8 (though my impression is that there aren't any, at least none that are
close to fruition).
The standard way of dealing with whole numbers too big to fit in an integer
is to put them in a numeric (double down in C land).
Hello r users,
I have to deal with int8 data with R. AFAIK R does only handle int4
with `as.integer` function [1]. I wonder:
1. what is the better approach to handle int8 ? `as.character` ?
`as.numeric` ?
2. is there any plan to handle int8 in the future ? As you might know,
int4 is to small to d
Hi!
I noticed that cumsum behaves different than the other cumulative functions
wrt. NaN values:
> values <- c(1,2,NaN,1)
> for ( f in c(cumsum, cumprod, cummin, cummax)) print(f(values))
[1] 1 3 NA NA
[1] 1 2 NaN NaN
[1] 1 1 NaN NaN
[1] 1 2 NaN NaN
The reason is that cumsum (in cu
17 matches
Mail list logo