>>>>> Spencer Graves >>>>> on Sun, 19 Jan 2020 21:35:04 -0600 writes:
> Thanks to Luke and Avi for their comments. I wrapped "round" around the > call to "rnorm" inside my "rpois.". For "lambda" really big, that > "round" won't do anything. However, it appears to give integers in > floating point representation that are larger than > .Machine$integer.max. That sounds very much like what someone would > want. Spencer Coming late here -- after enjoying a proper weekend ;-) -- I have been agreeing (with Spencer, IIUC) on this for a long time (~ 3 yrs, or more?), namely that I've come to see it as a "design bug" that rpois() {and similar} must return return typeof() "integer". More strongly, I'm actually pretty convinced they should return (integer-valued) double instead of NA_integer_ and for that reason should always return double: Even if we have (hopefully) a native 64bit integer in R, 2^64 is still teeny tiny compared .Machine$double.max (and then maybe we'd have .Machine$longdouble.max which would be considerably larger than double.max unless on Windows, where the wise men at Microsoft decided to keep their workload simple by defining "long double := double" - as 'long double' unfortunately is not well defined by C standards) Martin > On 2020-01-19 21:00, Tierney, Luke wrote: >> R uses the C 'int' type for its integer data and that is pretty much >> universally 32 bit these days. In fact R wont' compile if it is not. >> That means the range for integer data is the integers in [-2^31, >> +2^31). >> >> It would be good to allow for a larger integer range for R integer >> objects, and several of us are thinking about how me might get there. >> But it isn't easy to get right, so it may take some time. I doubt >> anything can happen for R 4.0.0 this year, but 2021 may be possible. >> >> I few notes inline below: >> >> On Sun, 19 Jan 2020, Spencer Graves wrote: >> >>> On my Mac: >>> >>> >>> str(.Machine) >>> ... >>> $ integer.max : int 2147483647 >>> $ sizeof.long : int 8 >>> $ sizeof.longlong : int 8 >>> $ sizeof.longdouble : int 16 >>> $ sizeof.pointer : int 8 >>> >>> >>> On a Windows 10 machine I have, $ sizeof.long : int 4; otherwise >>> the same as on my Mac. >> One of many annoyances of Windows -- done for compatibility with >> ancient Window apps. >> >>> Am I correct that $ sizeof.long = 4 means 4 bytes = 32 bits? >>> log2(.Machine$integer.max) = 31. Then 8 bytes is what used to be called >>> double precision (2 words of 4 bytes each)? And $ sizeof.longdouble = >>> 16 = 4 words of 4 bytes each? >> double precision is a floating point concept, not related to integers. >> >> If you want to figure out whether you are running a 32 bit or 64 bit R >> look at sizeof.pointer -- 4 means 32 bits, 8 64 bits. >> >> Best, >> >> luke >> >> >>> >>> Spencer >>> >>> >>> On 2020-01-19 15:41, Avraham Adler wrote: >>>> Floor (maybe round) of non-negative numerics, though. Poisson should >>>> never have anything after decimal. >>>> >>>> Still think it’s worth allowing long long for R64 bit, just for purity >>>> sake. >>>> >>>> Avi >>>> >>>> On Sun, Jan 19, 2020 at 4:38 PM Spencer Graves >>>> <spencer.gra...@prodsyse.com <mailto:spencer.gra...@prodsyse.com>> wrote: >>>> >>>> >>>> >>>> On 2020-01-19 13:01, Avraham Adler wrote: >>>>> Crazy thought, but being that a sum of Poissons is Poisson in the >>>>> sum, can you break your “big” simulation into the sum of a few >>>>> smaller ones? Or is the order of magnitude difference just too great? >>>> >>>> I don't perceive that as feasible. Once I found what was >>>> generating NAs, it was easy to code a function to return >>>> pseudo-random numbers using the standard normal approximation to >>>> the Poisson for those extreme cases. [For a Poisson with mean = >>>> 1e6, for example, the skewness (third standardized moment) is >>>> 0.001. At least for my purposes, that should be adequate.][1] >>>> >>>> >>>> What are the negative consequences of having rpois return >>>> numerics that are always nonnegative? >>>> >>>> >>>> Spencer >>>> >>>> >>>> [1] In the code I reported before, I just changed the threshold >>>> of 1e6 to 0.5*.Machine$integer.max. On my Mac, >>>> .Machine$integer.max = 2147483647 = 2^31 > 1e9. That still means >>>> that a Poisson distributed pseudo-random number just under that >>>> would have to be over 23000 standard deviations above the mean to >>>> exceed .Machine$integer.max. >>>> >>>>> On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves >>>>> <spencer.gra...@prodsyse.com >>>>> <mailto:spencer.gra...@prodsyse.com>> wrote: >>>>> >>>>> This issue arose for me in simulations to estimate >>>>> confidence, prediction, and tolerance intervals from glm(., >>>>> family=poisson) fits embedded in a BMA::bic.glm fit using a >>>>> simulate.bic.glm function I added to the development version >>>>> of Ecfun, available at "https://github.com/sbgraves237/Ecfun" >>>>> <https://github.com/sbgraves237/Ecfun>. This is part of a >>>>> vignette I'm developing, available at >>>>> "https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd" >>>>> <https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd>. >>>>> This includes a simulated mean of a mixture of Poissons that >>>>> exceeds 2e22. It doesn't seem unreasonable to me to have >>>>> rpois output a numerics rather than integers when a number >>>>> simulated exceeds .Machine$integer.max. And it does seem to >>>>> make less sense in such cases to return NAs. >>>>> >>>>> >>>>> Alternatively, might it make sense to add another >>>>> argument to rpois to give the user the choice? E.g., an >>>>> argument "bigOutput" with (I hope) default = "numeric" and >>>>> "NA" as a second option. Or NA is the default, so no code >>>>> that relied that feature of the current code would be broken >>>>> by the change. If someone wanted to use arbitrary precision >>>>> arithmetic, they could write their own version of this >>>>> function with "arbitraryPrecision" as an optional value for >>>>> the "bigOutput" argument. >>>>> >>>>> >>>>> Comments? >>>>> Thanks, >>>>> Spencer Graves >>>>> >>>>> >>>>> >>>>> On 2020-01-19 10:28, Avraham Adler wrote: >>>>> Technically, lambda can always be numeric. It is the >>>>> observations which must be integral. >>>>>> >>>>> Would hitting everything larger than maxint or maxlonglong >>>>> with floor or round fundamentally change the distribution? >>>>> Well, yes, but enough that it would matter over process risk? >>>>>> >>>>> Avi >>>>>> >>>>> On Sun, Jan 19, 2020 at 11:20 AM Benjamin Tyner >>>>> <bty...@gmail.com <mailto:bty...@gmail.com>> wrote: >>>>>> >>>>> So imagine rpois is changed, such that the storage mode >>>>> of its return >>>>> value is sometimes integer and sometimes numeric. Then >>>>> imagine the case >>>>> where lambda is itself a realization of a random >>>>> variable. Do we really >>>>> want the storage mode to inherit that randomness? >>>>>> >>>>>> >>>>> On 1/19/20 10:47 AM, Avraham Adler wrote: >>>>> > Maybe there should be code for 64 bit R to use long >>>>> long or the like? >>>>> > >>>>> > On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves >>>>> > <spencer.gra...@prodsyse.com >>>>> <mailto:spencer.gra...@prodsyse.com> >>>>> <mailto:spencer.gra...@prodsyse.com >>>>> <mailto:spencer.gra...@prodsyse.com>>> wrote: >>>>> > >>>>> > >>>>> > >>>>> > On 2020-01-19 09:34, Benjamin Tyner wrote: >>>>> > >> >>>>> > >>>>> >>>>> ------------------------------------------------------------------------ >>>>> > >> Hello, All: >>>>> > >> >>>>> > >> >>>>> > >> Consider: >>>>> > >> >>>>> > >> >>>>> > >> Browse[2]> set.seed(1) >>>>> > >> Browse[2]> rpois(9, 1e10) >>>>> > >> NAs produced[1] NA NA NA NA NA NA NA NA NA >>>>> > >> >>>>> > >> >>>>> > >> Should this happen? >>>>> > >> >>>>> > >> >>>>> > >> I think that for, say, lambda>1e6, >>>>> rpois should return >>>>> > rnorm(., >>>>> > >> lambda, sqrt(lambda)). >>>>> > > But need to implement carefully; rpois should >>>>> always return a >>>>> > > non-negative integer, whereas rnorm always >>>>> returns numeric... >>>>> > > >>>>> > >>>>> > Thanks for the reply. >>>>> > >>>>> > >>>>> > However, I think it's not acceptable to get >>>>> an NA from a >>>>> > number >>>>> > that cannot be expressed as an integer. Whenever >>>>> a randomly >>>>> > generated >>>>> > number would exceed .Machine$integer.max, the >>>>> choice is between >>>>> > returning NA or a non-integer numeric. Consider: >>>>> > >>>>> > >>>>> > > 2*.Machine$integer.max >>>>> > [1] 4294967294 >>>>> > > as.integer(2*.Machine$integer.max) >>>>> > [1] NA >>>>> > Warning message: >>>>> > NAs introduced by coercion to integer range >>>>> > >>>>> > >>>>> > I'd rather have the non-integer numeric. >>>>> > >>>>> > >>>>> > Spencer >>>>> > >>>>> > ______________________________________________ >>>>> > R-devel@r-project.org <mailto:R-devel@r-project.org> >>>>> <mailto:R-devel@r-project.org >>>>> <mailto:R-devel@r-project.org>> mailing list >>>>> > https://stat.ethz.ch/mailman/listinfo/r-devel >>>>> > >>>>> > -- >>>>> > Sent from Gmail Mobile >>>>>> >>>>> -- >>>>> Sent from Gmail Mobile >>>>> -- >>>>> Sent from Gmail Mobile >>>> -- >>>> Sent from Gmail Mobile >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel