date:20170121

Re: [Rd] How to handle INT8 data

2017-01-21 Thread Jeroen Ooms

On Fri, Jan 20, 2017 at 6:09 PM, Murray Stokely  wrote:
> The lack of 64 bit integer support causes lots of problems when dealing
> with certain types of data where the loss of precision from coercing to 53
> bits with double is unacceptable.
>
> Two packages were developed to deal with this:  int64 and bit64.

Don't forget packages for large arbitrary large numbers such as Rmpfr
and openssl.

  x <- openssl::bignum("12345678987654321")
  x^10

The risk of storing int64 as a double (e.g. in bit64) is that it might
easily be mistaken for a completely different value via unclass() or
Rf_isNumeric() or so.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] xtabs(), factors and NAs

2017-01-21 Thread Milan Bouchet-Valat

Le vendredi 20 janvier 2017 à 18:59 +0100, Martin Maechler a écrit :
> > > > > > > > > > > > Milan Bouchet-Valat 
> > > > > > on Thu, 19 Jan 2017 13:58:31 +0100 writes:
> > Hi all,
> > I know this issue has been discussed a few times in the past already,
> > but Martin Maechler suggested in a bug report [1] that I raise it here.
> > 
> > Basically, there is no (easy) way of printing NAs for all variables
> > when calling xtabs() on factors. Passing 'exclude=NULL,
> > na.action=na.pass' works for character vectors, but not for factors.
> > 
> 
> [ yes, but your example below is *not* showing that ... so may be
>   a bit confusing !]  {Reason: stringsAsFactors etc}
Yes, sorry, that illustrates why one should never try to make an
example prettier in the last minute. For reference, here's the correct
example:

> test <- data.frame(x=c("a",NA), stringsAsFactors=FALSE)
> xtabs(~ x, exclude=NULL, na.action=na.pass, data=test)
x
   a  
   11 

> test <- data.frame(x=factor(c("a",NA)))
> xtabs(~ x, exclude=NULL, na.action=na.pass, data=test)
x
a 
1 


> > > test <- data.frame(x=c("a",NA))
> > > xtabs(~ x, exclude=NULL,
> > 
> > na.action=na.pass, data=test)
> > x
> > a 
> > 1 
> > 
> > > test <- data.frame(x=factor(c("a",NA)))
> > > xtabs(~ x, exclude=NULL,
> > 
> > na.action=na.pass, data=test)
> > x
> > a 
> > 1 
> > 
> > 
> > Even if it's documented, this inconsistency is annoying. When checking
> > data, it is often useful to print all NA values temporarily, without
> > calling addNA() individually on all crossed variables.
> 
>   {Note this is not (just) about print()ing; the issue is
>    about the resulting *object*.}
> > 
> > Would it make sense to add a new argument similar to table()'s useNA
> > which would behave the same for all input vector types?
> 
> You have to be aware that  table()  has been changed since R
> 3.3.2, i.e., is different in R-devel and hence will be different
> in R 3.4.0.
> table()'s handling of NAs has become very involved /
> sophisticated(*), and currently I'd rather like to keep
> xtabs()'s behavior much simpler. 
> 
> Interestingly, after starting to play with data containing NA's and
>   xtabs(*, na.action=na.pass)
> I have already detected bugs (for sparse=TRUE) and cases where
> the current xtabs() behavior seems dubious to me.
> So, the issue is --- as so often --- more involved than assumed initially.
> 
> We (R core) will probably do something, but do need more time
> before we can promise anything more...
OK, thanks. Given for how long this behavior has existed, there's
certainly no hurry...


Regards

> Thank you for raising the issue,
> Martin Maechler, ETH Zurich
> 
> 
> *) R-devel sources always current at
>    https://svn.r-project.org/R/trunk/src/library/base/R/table.R
> 
> > 
> > Regards
> > [1] https://bugs.r-project.org/bugzilla/show_bug.cgi?id=14630

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] How to handle INT8 data

2017-01-21 Thread Hadley Wickham

To summarise this thread, there are basically three ways of handling int64 in R:

* coerce to character
* coerce to double
* store in double

There is no ideal solution, and each have pros and cons that I've
attempted to summarise below.

## Coerce to character

This is the easiest approach if the data is used as identifiers. It
will have some performance drawbacks when loading and will require
additional memory. It should not have negative performance
implications once the data has been loaded because R has a global
string pool so string comparisons only require a single pointer
comparison (assuming they have the same encoding)

## Coerce to double

This is the easiest approach if your integers are in the range
[-(2^53), 2^53] or you can tolerate some minor loss of precision.

## Store in a double

This technique takes advantage of the fact that doubles and int64s are
the same size, so you can store the binary representation of an int64
in a double. This will effectively be garbage if you treat the vector
as if it is a double, so it requires adding an S3 class and overriding
every generic function with a custom method. Not all functions are
generic, and internal C code will not know about the special class, so
this has the danger of code silently interpreting the data
incorrectly.

This is the approach taken by the bit64 package (and, I believe, the
int64 package, but since that's been archived it's not worth
considering.

Hadley

On Fri, Jan 20, 2017 at 9:19 AM, Gabriel Becker  wrote:
> I am not on R-core, so cannot speak to future plans to internally support
> int8 (though my impression is that there aren't any, at least none that are
> close to fruition).
>
> The standard way of dealing with whole numbers too big to fit in an integer
> is to put them in a numeric (double down in C land). this can represent
> integers up to 2^53 without loss of precision see (
> http://stackoverflow.com/questions/1848700/biggest-integer-that-can-be-stored-in-a-double).
> This is how long vector indices are (currently) implemented in R. If it's
> good enough for indices it's probably good enough for whatever you need
> them for.
>
> Hope that helps.
>
> ~G
>
>
> On Fri, Jan 20, 2017 at 6:33 AM, Nicolas Paris 
> wrote:
>
>> Hello r users,
>>
>> I have to deal with int8 data with R. AFAIK  R does only handle int4
>> with `as.integer` function [1]. I wonder:
>> 1. what is the better approach to handle int8 ? `as.character` ?
>> `as.numeric` ?
>> 2. is there any plan to handle int8 in the future ? As you might know,
>> int4 is to small to deal with earth population right now.
>>
>> Thanks for you ideas,
>>
>> int8 eg:
>>
>>  human_id
>> --
>>  -1311071933951566764
>>  -4708675461424073238
>>  -6865005668390999818
>>   5578000650960353108
>>  -3219674686933841021
>>  -6469229889308771589
>>   -606871692563545028
>>  -8199987422425699249
>>   -463287495999648233
>>   7675955260644241951
>>
>> reference:
>> 1. https://www.r-bloggers.com/r-in-a-64-bit-world/
>>
>> --
>> Nicolas PARIS
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
>
> --
> Gabriel Becker, PhD
> Associate Scientist (Bioinformatics)
> Genentech Research
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] bug in rbind?

2017-01-21 Thread Joshua Ulrich

I'm not sure whether or not this is a bug, but I did isolate the line
where the error is thrown:
src/library/base/R/dataframe.R:1395.
https://github.com/wch/r-source/blob/01374c3c367fa12f555fd354f735a6e16e5bd98e/src/library/base/R/dataframe.R#L1395

The error is thrown because the line attempts to set a subset of the
rownames to NULL, which fails.

R> options(error = recover)
R> rbind(dfm.names, dfm)
Error in rownames(value[[jj]])[ri] <- rownames(xij) :
  replacement has length zero

Enter a frame number, or 0 to exit

1: rbind(dfm.names, dfm)
2: rbind(deparse.level, ...)

Selection: 2
Called from: top level
Browse[1]> rownames(value[[jj]])
[1] "a" "b" "c" NA  NA  NA
Browse[1]> rownames(xij)
NULL
Browse[1]> ri
[1] 4 5 6
Browse[1]> rownames(value[[jj]])[ri]
[1] NA NA NA


On Mon, Jan 16, 2017 at 7:50 PM, Krzysztof Banas  wrote:
> I suspect there may be a bug in base::rbind.data.frame
>
> Below there is minimal example of the problem:
>
> m <- matrix (1:12, 3)
> dfm <- data.frame (c = 1 : 3, m = I (m))
> str (dfm)
>
> m.names <- m
> rownames (m.names) <- letters [1:3]
> dfm.names <- data.frame (c = 1 : 3, m = I (m.names))
> str (dfm.names)
>
> rbind (m, m.names)
> rbind (m.names, m)
> rbind (dfm, dfm.names)
>
> #not working
> rbind (dfm.names, dfm)
>
> Error in rbind(deparse.level, ...) : replacement has length zero
>
> rbind (dfm, dfm.names)$m
>
>
>  [,1] [,2] [,3] [,4]
>
> 147   10
>
> 258   11
>
> 369   12
>
> a   147   10
>
> b   258   11
>
> c   369   12
>
>
>
> 
>
> Important: This email is confidential and may be privileged. If you are not 
> the intended recipient, please delete it and notify us immediately; you 
> should not copy or use it for any purpose, nor disclose its contents to any 
> other person. Thank you.
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com
R/Finance 2016 | www.rinfinance.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] How to handle INT8 data

2017-01-21 Thread Dirk Eddelbuettel


On 21 January 2017 at 10:56, Hadley Wickham wrote:
| To summarise this thread, there are basically three ways of handling int64 in 
R:
| 
| * coerce to character
| * coerce to double
| * store in double
| 
| ## Coerce to character

Serious performance loss.
 
| ## Coerce to double

Serious precision + functionality loss.

Rember, int64, not int53, is what we are after. That that is what other
systems we want to interop with have (bigtable indices).

| ## Store in a double

Best approach in my book, and done in bit64::integer.

| This is the approach taken by the bit64 package (and, I believe, the

Incorrect.

That used an S4 class with two int32. The bit64 package has a bit on
comparison. But as int64 is abandonware it doesn't matter either way.

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] How to handle INT8 data

Re: [Rd] xtabs(), factors and NAs

Re: [Rd] How to handle INT8 data

Re: [Rd] bug in rbind?

Re: [Rd] How to handle INT8 data

5 matches

Site Navigation

Mail list logo

Footer information