Hi, I have a long numeric vector 'xx' and I want to use sum() to count the number of elements that satisfy some criteria like non-zero values or values lower than a certain threshold etc...
The problem is: sum() returns an NA (with a warning) if the count is greater than 2^31. For example: > xx <- runif(3e9) > sum(xx < 0.9) [1] NA Warning message: In sum(xx < 0.9) : integer overflow - use sum(as.numeric(.)) This already takes a long time and doing sum(as.numeric(.)) would take even longer and require allocation of 24Gb of memory just to store an intermediate numeric vector made of 0s and 1s. Plus, having to do sum(as.numeric(.)) every time I need to count things is not convenient and is easy to forget. It seems that sum() on a logical vector could be modified to return the count as a double when it cannot be represented as an integer. Note that length() already does this so that wouldn't create a precedent. Also and FWIW prod() avoids the problem by always returning a double, whatever the type of the input is (except on a complex vector). I can provide a patch if this change sounds reasonable. Cheers, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel