Re: [Rd] How to handle INT8 data
On Fri, Jan 20, 2017 at 6:09 PM, Murray Stokely wrote: > The lack of 64 bit integer support causes lots of problems when dealing > with certain types of data where the loss of precision from coercing to 53 > bits with double is unacceptable. > > Two packages were developed to deal with this: int64 and bit64. Don't forget packages for large arbitrary large numbers such as Rmpfr and openssl. x <- openssl::bignum("12345678987654321") x^10 The risk of storing int64 as a double (e.g. in bit64) is that it might easily be mistaken for a completely different value via unclass() or Rf_isNumeric() or so. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] xtabs(), factors and NAs
Le vendredi 20 janvier 2017 à 18:59 +0100, Martin Maechler a écrit : > > > > > > > > > > > > Milan Bouchet-Valat > > > > > > on Thu, 19 Jan 2017 13:58:31 +0100 writes: > > Hi all, > > I know this issue has been discussed a few times in the past already, > > but Martin Maechler suggested in a bug report [1] that I raise it here. > > > > Basically, there is no (easy) way of printing NAs for all variables > > when calling xtabs() on factors. Passing 'exclude=NULL, > > na.action=na.pass' works for character vectors, but not for factors. > > > > [ yes, but your example below is *not* showing that ... so may be > a bit confusing !] {Reason: stringsAsFactors etc} Yes, sorry, that illustrates why one should never try to make an example prettier in the last minute. For reference, here's the correct example: > test <- data.frame(x=c("a",NA), stringsAsFactors=FALSE) > xtabs(~ x, exclude=NULL, na.action=na.pass, data=test) x a 11 > test <- data.frame(x=factor(c("a",NA))) > xtabs(~ x, exclude=NULL, na.action=na.pass, data=test) x a 1 > > > test <- data.frame(x=c("a",NA)) > > > xtabs(~ x, exclude=NULL, > > > > na.action=na.pass, data=test) > > x > > a > > 1 > > > > > test <- data.frame(x=factor(c("a",NA))) > > > xtabs(~ x, exclude=NULL, > > > > na.action=na.pass, data=test) > > x > > a > > 1 > > > > > > Even if it's documented, this inconsistency is annoying. When checking > > data, it is often useful to print all NA values temporarily, without > > calling addNA() individually on all crossed variables. > > {Note this is not (just) about print()ing; the issue is > about the resulting *object*.} > > > > Would it make sense to add a new argument similar to table()'s useNA > > which would behave the same for all input vector types? > > You have to be aware that table() has been changed since R > 3.3.2, i.e., is different in R-devel and hence will be different > in R 3.4.0. > table()'s handling of NAs has become very involved / > sophisticated(*), and currently I'd rather like to keep > xtabs()'s behavior much simpler. > > Interestingly, after starting to play with data containing NA's and > xtabs(*, na.action=na.pass) > I have already detected bugs (for sparse=TRUE) and cases where > the current xtabs() behavior seems dubious to me. > So, the issue is --- as so often --- more involved than assumed initially. > > We (R core) will probably do something, but do need more time > before we can promise anything more... OK, thanks. Given for how long this behavior has existed, there's certainly no hurry... Regards > Thank you for raising the issue, > Martin Maechler, ETH Zurich > > > *) R-devel sources always current at > https://svn.r-project.org/R/trunk/src/library/base/R/table.R > > > > > Regards > > [1] https://bugs.r-project.org/bugzilla/show_bug.cgi?id=14630 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] How to handle INT8 data
To summarise this thread, there are basically three ways of handling int64 in R: * coerce to character * coerce to double * store in double There is no ideal solution, and each have pros and cons that I've attempted to summarise below. ## Coerce to character This is the easiest approach if the data is used as identifiers. It will have some performance drawbacks when loading and will require additional memory. It should not have negative performance implications once the data has been loaded because R has a global string pool so string comparisons only require a single pointer comparison (assuming they have the same encoding) ## Coerce to double This is the easiest approach if your integers are in the range [-(2^53), 2^53] or you can tolerate some minor loss of precision. ## Store in a double This technique takes advantage of the fact that doubles and int64s are the same size, so you can store the binary representation of an int64 in a double. This will effectively be garbage if you treat the vector as if it is a double, so it requires adding an S3 class and overriding every generic function with a custom method. Not all functions are generic, and internal C code will not know about the special class, so this has the danger of code silently interpreting the data incorrectly. This is the approach taken by the bit64 package (and, I believe, the int64 package, but since that's been archived it's not worth considering. Hadley On Fri, Jan 20, 2017 at 9:19 AM, Gabriel Becker wrote: > I am not on R-core, so cannot speak to future plans to internally support > int8 (though my impression is that there aren't any, at least none that are > close to fruition). > > The standard way of dealing with whole numbers too big to fit in an integer > is to put them in a numeric (double down in C land). this can represent > integers up to 2^53 without loss of precision see ( > http://stackoverflow.com/questions/1848700/biggest-integer-that-can-be-stored-in-a-double). > This is how long vector indices are (currently) implemented in R. If it's > good enough for indices it's probably good enough for whatever you need > them for. > > Hope that helps. > > ~G > > > On Fri, Jan 20, 2017 at 6:33 AM, Nicolas Paris > wrote: > >> Hello r users, >> >> I have to deal with int8 data with R. AFAIK R does only handle int4 >> with `as.integer` function [1]. I wonder: >> 1. what is the better approach to handle int8 ? `as.character` ? >> `as.numeric` ? >> 2. is there any plan to handle int8 in the future ? As you might know, >> int4 is to small to deal with earth population right now. >> >> Thanks for you ideas, >> >> int8 eg: >> >> human_id >> -- >> -1311071933951566764 >> -4708675461424073238 >> -6865005668390999818 >> 5578000650960353108 >> -3219674686933841021 >> -6469229889308771589 >> -606871692563545028 >> -8199987422425699249 >> -463287495999648233 >> 7675955260644241951 >> >> reference: >> 1. https://www.r-bloggers.com/r-in-a-64-bit-world/ >> >> -- >> Nicolas PARIS >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > > > -- > Gabriel Becker, PhD > Associate Scientist (Bioinformatics) > Genentech Research > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- http://hadley.nz __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] bug in rbind?
I'm not sure whether or not this is a bug, but I did isolate the line where the error is thrown: src/library/base/R/dataframe.R:1395. https://github.com/wch/r-source/blob/01374c3c367fa12f555fd354f735a6e16e5bd98e/src/library/base/R/dataframe.R#L1395 The error is thrown because the line attempts to set a subset of the rownames to NULL, which fails. R> options(error = recover) R> rbind(dfm.names, dfm) Error in rownames(value[[jj]])[ri] <- rownames(xij) : replacement has length zero Enter a frame number, or 0 to exit 1: rbind(dfm.names, dfm) 2: rbind(deparse.level, ...) Selection: 2 Called from: top level Browse[1]> rownames(value[[jj]]) [1] "a" "b" "c" NA NA NA Browse[1]> rownames(xij) NULL Browse[1]> ri [1] 4 5 6 Browse[1]> rownames(value[[jj]])[ri] [1] NA NA NA On Mon, Jan 16, 2017 at 7:50 PM, Krzysztof Banas wrote: > I suspect there may be a bug in base::rbind.data.frame > > Below there is minimal example of the problem: > > m <- matrix (1:12, 3) > dfm <- data.frame (c = 1 : 3, m = I (m)) > str (dfm) > > m.names <- m > rownames (m.names) <- letters [1:3] > dfm.names <- data.frame (c = 1 : 3, m = I (m.names)) > str (dfm.names) > > rbind (m, m.names) > rbind (m.names, m) > rbind (dfm, dfm.names) > > #not working > rbind (dfm.names, dfm) > > Error in rbind(deparse.level, ...) : replacement has length zero > > rbind (dfm, dfm.names)$m > > > [,1] [,2] [,3] [,4] > > 147 10 > > 258 11 > > 369 12 > > a 147 10 > > b 258 11 > > c 369 12 > > > > > > Important: This email is confidential and may be privileged. If you are not > the intended recipient, please delete it and notify us immediately; you > should not copy or use it for any purpose, nor disclose its contents to any > other person. Thank you. > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com R/Finance 2016 | www.rinfinance.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] How to handle INT8 data
On 21 January 2017 at 10:56, Hadley Wickham wrote: | To summarise this thread, there are basically three ways of handling int64 in R: | | * coerce to character | * coerce to double | * store in double | | ## Coerce to character Serious performance loss. | ## Coerce to double Serious precision + functionality loss. Rember, int64, not int53, is what we are after. That that is what other systems we want to interop with have (bigtable indices). | ## Store in a double Best approach in my book, and done in bit64::integer. | This is the approach taken by the bit64 package (and, I believe, the Incorrect. That used an S4 class with two int32. The bit64 package has a bit on comparison. But as int64 is abandonware it doesn't matter either way. Dirk -- http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel