>>>>> Martin Maechler <maech...@stat.math.ethz.ch> >>>>> on Tue, 6 Jun 2017 09:45:44 +0200 writes:
>>>>> Hervé Pagès <hpa...@fredhutch.org> >>>>> on Fri, 2 Jun 2017 04:05:15 -0700 writes: >> Hi, I have a long numeric vector 'xx' and I want to use >> sum() to count the number of elements that satisfy some >> criteria like non-zero values or values lower than a >> certain threshold etc... >> The problem is: sum() returns an NA (with a warning) if >> the count is greater than 2^31. For example: >>> xx <- runif(3e9) sum(xx < 0.9) >> [1] NA Warning message: In sum(xx < 0.9) : integer >> overflow - use sum(as.numeric(.)) >> This already takes a long time and doing >> sum(as.numeric(.)) would take even longer and require >> allocation of 24Gb of memory just to store an >> intermediate numeric vector made of 0s and 1s. Plus, >> having to do sum(as.numeric(.)) every time I need to >> count things is not convenient and is easy to forget. >> It seems that sum() on a logical vector could be modified >> to return the count as a double when it cannot be >> represented as an integer. Note that length() already >> does this so that wouldn't create a precedent. Also and >> FWIW prod() avoids the problem by always returning a >> double, whatever the type of the input is (except on a >> complex vector). >> I can provide a patch if this change sounds reasonable. > This sounds very reasonable, thank you Hervé, for the > report, and even more for a (small) patch. I was made aware of the fact, that R treats logical and integer very often identically in the C code, and in general we even mention that logicals are treated as 0/1/NA integers in arithmetic. For the present case that would mean that we should also safe-guard against *integer* overflow in sum(.) and that is not something we have done / wanted to do in the past... Speed being one reason. So this ends up being more delicate than I had thought at first, because changing sum(<logical>) only would mean that sum(LOGI) and sum(as.integer(LOGI)) would start differ for a logical vector LOGI. So, for now this is something that must be approached carefully, and the R Core team may want discuss "in private" first. I'm sorry for having raised possibly unrealistic expectations. Martin > Martin >> Cheers, H. >> -- >> Hervé Pagès >> Program in Computational Biology Division of Public >> Health Sciences Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA >> 98109-1024 >> E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax: >> (206) 667-1319 >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel