I second this feature request (it's understandable that this and possibly other parts of the code was left behind / forgotten after the introduction of long vector).
I think mean() avoids full copies, so in the meanwhile, you can work around this limitation using: countTRUE <- function(x, na.rm = FALSE) { nx <- length(x) if (nx < .Machine$integer.max) return(sum(x, na.rm = na.rm)) nx * mean(x, na.rm = na.rm) } (not sure if one needs to worry about rounding errors, i.e. where n %% 0 != 0) x <- rep(TRUE, times = .Machine$integer.max+1) object.size(x) ## 8589934632 bytes p <- profmem::profmem( n <- countTRUE(x) ) str(n) ## num 2.15e+09 print(n == .Machine$integer.max + 1) ## [1] TRUE print(p) ## Rprofmem memory profiling of: ## n <- countTRUE(x) ## ## Memory allocations: ## bytes calls ## total 0 FYI / related: I've just updated matrixStats::sum2() to support logicals (develop branch) and I'll also try to update matrixStats::count() to count beyond .Machine$integer.max. /Henrik On Fri, Jun 2, 2017 at 4:05 AM, Hervé Pagès <hpa...@fredhutch.org> wrote: > Hi, > > I have a long numeric vector 'xx' and I want to use sum() to count > the number of elements that satisfy some criteria like non-zero > values or values lower than a certain threshold etc... > > The problem is: sum() returns an NA (with a warning) if the count > is greater than 2^31. For example: > > > xx <- runif(3e9) > > sum(xx < 0.9) > [1] NA > Warning message: > In sum(xx < 0.9) : integer overflow - use sum(as.numeric(.)) > > This already takes a long time and doing sum(as.numeric(.)) would > take even longer and require allocation of 24Gb of memory just to > store an intermediate numeric vector made of 0s and 1s. Plus, having > to do sum(as.numeric(.)) every time I need to count things is not > convenient and is easy to forget. > > It seems that sum() on a logical vector could be modified to return > the count as a double when it cannot be represented as an integer. > Note that length() already does this so that wouldn't create a > precedent. Also and FWIW prod() avoids the problem by always returning > a double, whatever the type of the input is (except on a complex > vector). > > I can provide a patch if this change sounds reasonable. > > Cheers, > H. > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fredhutch.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel