Re: [Rd] [External] Re: rpois(9, 1e10)

Spencer Graves Sun, 19 Jan 2020 19:36:03 -0800

Thanks to Luke and Avi for their comments. I wrapped "round" around thecall to "rnorm" inside my "rpois.". For "lambda" really big, that"round" won't do anything. However, it appears to give integers infloating point representation that are larger than.Machine$integer.max. That sounds very much like what someone wouldwant. Spencer


On 2020-01-19 21:00, Tierney, Luke wrote:

R uses the C 'int' type for its integer data and that is pretty much
universally 32 bit these days. In fact R wont' compile if it is not.
That means the range for integer data is the integers in [-2^31,
+2^31).

It would be good to allow for a larger integer range for R integer
objects, and several of us are thinking about how me might get there.
But it isn't easy to get right, so it may take some time. I doubt
anything can happen for R 4.0.0 this year, but 2021 may be possible.

I few notes inline below:

On Sun, 19 Jan 2020, Spencer Graves wrote:

On my Mac:


str(.Machine)
...
$ integer.max          : int 2147483647
  $ sizeof.long          : int 8
  $ sizeof.longlong      : int 8
  $ sizeof.longdouble    : int 16
  $ sizeof.pointer       : int 8


       On a Windows 10 machine I have, $ sizeof.long : int 4; otherwise
the same as on my Mac.

One of many annoyances of Windows -- done for compatibility with
ancient Window apps.

       Am I correct that $ sizeof.long = 4 means 4 bytes = 32 bits?
log2(.Machine$integer.max) = 31.  Then 8 bytes is what used to be called
double precision (2 words of 4 bytes each)?  And $ sizeof.longdouble =
16 = 4 words of 4 bytes each?

double precision is a floating point concept, not related to integers.

If you want to figure out whether you are running a 32 bit or 64 bit R
look at sizeof.pointer -- 4 means 32 bits, 8 64 bits.

Best,

luke


       Spencer


On 2020-01-19 15:41, Avraham Adler wrote:

Floor (maybe round) of non-negative numerics, though. Poisson should
never have anything after decimal.

Still think it’s worth allowing long long for R64 bit, just for purity
sake.

Avi

On Sun, Jan 19, 2020 at 4:38 PM Spencer Graves
<spencer.gra...@prodsyse.com <mailto:spencer.gra...@prodsyse.com>> wrote:



     On 2020-01-19 13:01, Avraham Adler wrote:

     Crazy thought, but being that a sum of Poissons is Poisson in the
     sum, can you break your “big” simulation into the sum of a few
     smaller ones? Or is the order of magnitude difference just too great?


           I don't perceive that as feasible.  Once I found what was
     generating NAs, it was easy to code a function to return
     pseudo-random numbers using the standard normal approximation to
     the Poisson for those extreme cases.  [For a Poisson with mean =
     1e6, for example, the skewness (third standardized moment) is
     0.001.  At least for my purposes, that should be adequate.][1]


           What are the negative consequences of having rpois return
     numerics that are always nonnegative?


           Spencer


     [1]  In the code I reported before, I just changed the threshold
     of 1e6 to 0.5*.Machine$integer.max.  On my Mac,
     .Machine$integer.max = 2147483647 = 2^31 > 1e9. That still means
     that a Poisson distributed pseudo-random number just under that
     would have to be over 23000 standard deviations above the mean to
     exceed .Machine$integer.max.

     On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves
     <spencer.gra...@prodsyse.com
     <mailto:spencer.gra...@prodsyse.com>> wrote:

               This issue arose for me in simulations to estimate
         confidence, prediction, and tolerance intervals from glm(.,
         family=poisson) fits embedded in a BMA::bic.glm fit using a
         simulate.bic.glm function I added to the development version
         of Ecfun, available at "https://github.com/sbgraves237/Ecfun";
         <https://github.com/sbgraves237/Ecfun>. This is part of a
         vignette I'm developing, available at
         
"https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd";
         
<https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd>.
         This includes a simulated mean of a mixture of Poissons that
         exceeds 2e22.  It doesn't seem unreasonable to me to have
         rpois output a numerics rather than integers when a number
         simulated exceeds .Machine$integer.max.  And it does seem to
         make less sense in such cases to return NAs.


                Alternatively, might it make sense to add another
         argument to rpois to give the user the choice?  E.g., an
         argument "bigOutput" with (I hope) default = "numeric" and
         "NA" as a second option.  Or NA is the default, so no code
         that relied that feature of the current code would be broken
         by the change.  If someone wanted to use arbitrary precision
         arithmetic, they could write their own version of this
         function with "arbitraryPrecision" as an optional value for
         the "bigOutput" argument.


               Comments?
               Thanks,
               Spencer Graves



         On 2020-01-19 10:28, Avraham Adler wrote:

         Technically, lambda can always be numeric. It is the
         observations which must be integral.

         Would hitting everything larger than maxint or maxlonglong
         with floor or round fundamentally change the distribution?
         Well, yes, but enough that it would matter over process risk?

         Avi

         On Sun, Jan 19, 2020 at 11:20 AM Benjamin Tyner
         <bty...@gmail.com <mailto:bty...@gmail.com>> wrote:

             So imagine rpois is changed, such that the storage mode
             of its return
             value is sometimes integer and sometimes numeric. Then
             imagine the case
             where lambda is itself a realization of a random
             variable. Do we really
             want the storage mode to inherit that randomness?


             On 1/19/20 10:47 AM, Avraham Adler wrote:
            > Maybe there should be code for 64 bit R to use long
             long or the like?
            >
            > On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves
            > <spencer.gra...@prodsyse.com
             <mailto:spencer.gra...@prodsyse.com>
             <mailto:spencer.gra...@prodsyse.com
             <mailto:spencer.gra...@prodsyse.com>>> wrote:
            >
            >
            >
            >     On 2020-01-19 09:34, Benjamin Tyner wrote:
            >     >>
            >
              
------------------------------------------------------------------------
            >     >> Hello, All:
            >     >>
            >     >>
            >     >>         Consider:
            >     >>
            >     >>
            >     >> Browse[2]> set.seed(1)
            >     >> Browse[2]> rpois(9, 1e10)
            >     >> NAs produced[1] NA NA NA NA NA NA NA NA NA
            >     >>
            >     >>
            >     >>         Should this happen?
            >     >>
            >     >>
            >     >>         I think that for, say, lambda>1e6,
             rpois should return
            >     rnorm(.,
            >     >> lambda, sqrt(lambda)).
            >     > But need to implement carefully; rpois should
             always return a
            >     > non-negative integer, whereas rnorm always
             returns numeric...
            >     >
            >
            >            Thanks for the reply.
            >
            >
            >            However, I think it's not acceptable to get
             an NA from a
            >     number
            >     that cannot be expressed as an integer.  Whenever
             a randomly
            >     generated
            >     number would exceed .Machine$integer.max, the
             choice is between
            >     returning NA or a non-integer numeric.  Consider:
            >
            >
            >      > 2*.Machine$integer.max
            >     [1] 4294967294
            >      > as.integer(2*.Machine$integer.max)
            >     [1] NA
            >     Warning message:
            >     NAs introduced by coercion to integer range
            >
            >
            >            I'd rather have the non-integer numeric.
            >
            >
            >            Spencer
            >
            >  ______________________________________________
            > R-devel@r-project.org <mailto:R-devel@r-project.org>
             <mailto:R-devel@r-project.org
             <mailto:R-devel@r-project.org>> mailing list
            > https://stat.ethz.ch/mailman/listinfo/r-devel
            >
            > --
            > Sent from Gmail Mobile

         --
         Sent from Gmail Mobile

     --
     Sent from Gmail Mobile

--
Sent from Gmail Mobile


        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: rpois(9, 1e10)

Reply via email to