[Rd] bug in sum() on integer vector
Hi, x <- c(rep(180003L, 1000), -rep(120002L, 1500)) This is correct: > sum(as.double(x)) [1] 0 This is not: > sum(x) [1] 4996000 Returning NA (with a warning) would also be acceptable for the latter. That would make it consistent with cumsum(x): > cumsum(x)[length(x)] [1] NA Warning message: Integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))' Thanks! H. > sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_CA.UTF-8LC_COLLATE=en_CA.UTF-8 [5] LC_MONETARY=en_CA.UTF-8LC_MESSAGES=en_CA.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] bug in sum() on integer vector
On 09/12/2011 1:40 PM, Hervé Pagès wrote: Hi, x<- c(rep(180003L, 1000), -rep(120002L, 1500)) This is correct: > sum(as.double(x)) [1] 0 This is not: > sum(x) [1] 4996000 Returning NA (with a warning) would also be acceptable for the latter. That would make it consistent with cumsum(x): > cumsum(x)[length(x)] [1] NA Warning message: Integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))' This is a 64 bit problem; in 32 bits things work out properly. I'd guess in 64 bit arithmetic we or the run-time are doing something to simulate 32 bit arithmetic (since integers are 32 bits), but it looks as though we're not quite getting it right. Duncan Murdoch Thanks! H. > sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_CA.UTF-8LC_COLLATE=en_CA.UTF-8 [5] LC_MONETARY=en_CA.UTF-8LC_MESSAGES=en_CA.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] bug in sum() on integer vector
Hi Duncan, On 11-12-09 11:39 AM, Duncan Murdoch wrote: On 09/12/2011 1:40 PM, Hervé Pagès wrote: Hi, x<- c(rep(180003L, 1000), -rep(120002L, 1500)) This is correct: > sum(as.double(x)) [1] 0 This is not: > sum(x) [1] 4996000 Returning NA (with a warning) would also be acceptable for the latter. That would make it consistent with cumsum(x): > cumsum(x)[length(x)] [1] NA Warning message: Integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))' This is a 64 bit problem; in 32 bits things work out properly. I'd guess in 64 bit arithmetic we or the run-time are doing something to simulate 32 bit arithmetic (since integers are 32 bits), but it looks as though we're not quite getting it right. It doesn't work properly for me on Leopard (32-bit mode): > x <- c(rep(180003L, 1000), -rep(120002L, 1500)) > sum(as.double(x)) [1] 0 > sum(x) [1] 4996000 > sessionInfo() R version 2.14.0 RC (2011-10-27 r57452) Platform: i386-apple-darwin9.8.0/i386 (32-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base It looks like the problem is that isum() (in src/main/summary.c) uses a 'double' internally to do the sum, whereas rsum() and csum() use a 'long double'. Note that isum() seems to be assuming that NA_INTEGER and NA_LOGICAL will always be the same (probably fine) and that TRUE values in the input vector are always represented as a 1 (not so sure about this one). A more fundamental question: is switching back and forth between 'int' and 'double' (or 'long double') the right thing to do for doing "safe" arithmetic on integers? Thanks! H. Duncan Murdoch Thanks! H. > sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel