[Rd] rpois(9, 1e10)

2020-01-19 Thread Spencer Graves

Hello, All:


  Consider:


Browse[2]> set.seed(1)
Browse[2]> rpois(9, 1e10)
NAs produced[1] NA NA NA NA NA NA NA NA NA


  Should this happen?


  I think that for, say, lambda>1e6, rpois should return rnorm(., 
lambda, sqrt(lambda)).



  For my particular Monte Carlo, I have replaced my call to rpois 
with a call to the following:



 rpois. <- function(n, lambda){
  n2 <- max(length(n), length(lambda))
  n <- rep_len(n, n2)
  lambda <- rep_len(lambda, n2)
#
  big <- (lambda>1e6)
  out <- rep(NA, n2)
  out[big] <- rnorm(sum(big), lambda[big], sqrt(lambda[big]))
  out[!big] <- rpois(sum(!big), lambda[!big])
  out
  }


  Comments?
  Thanks,
  Spencer Graves

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] rpois(9, 1e10)

2020-01-19 Thread Benjamin Tyner


Hello, All:


    Consider:


Browse[2]> set.seed(1)
Browse[2]> rpois(9, 1e10)
NAs produced[1] NA NA NA NA NA NA NA NA NA


    Should this happen?


    I think that for, say, lambda>1e6, rpois should return rnorm(.,
lambda, sqrt(lambda)).
But need to implement carefully; rpois should always return a 
non-negative integer, whereas rnorm always returns numeric...


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] rpois(9, 1e10)

2020-01-19 Thread Spencer Graves




On 2020-01-19 09:34, Benjamin Tyner wrote:


Hello, All:


    Consider:


Browse[2]> set.seed(1)
Browse[2]> rpois(9, 1e10)
NAs produced[1] NA NA NA NA NA NA NA NA NA


    Should this happen?


    I think that for, say, lambda>1e6, rpois should return rnorm(.,
lambda, sqrt(lambda)).
But need to implement carefully; rpois should always return a 
non-negative integer, whereas rnorm always returns numeric...




  Thanks for the reply.


  However, I think it's not acceptable to get an NA from a number 
that cannot be expressed as an integer.  Whenever a randomly generated 
number would exceed .Machine$integer.max, the choice is between 
returning NA or a non-integer numeric.  Consider:



> 2*.Machine$integer.max
[1] 4294967294
> as.integer(2*.Machine$integer.max)
[1] NA
Warning message:
NAs introduced by coercion to integer range


  I'd rather have the non-integer numeric.


  Spencer

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] rpois(9, 1e10)

2020-01-19 Thread Avraham Adler
Maybe there should be code for 64 bit R to use long long or the like?

On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves 
wrote:

>
>
> On 2020-01-19 09:34, Benjamin Tyner wrote:
> >> 
> >> Hello, All:
> >>
> >>
> >> Consider:
> >>
> >>
> >> Browse[2]> set.seed(1)
> >> Browse[2]> rpois(9, 1e10)
> >> NAs produced[1] NA NA NA NA NA NA NA NA NA
> >>
> >>
> >> Should this happen?
> >>
> >>
> >> I think that for, say, lambda>1e6, rpois should return rnorm(.,
> >> lambda, sqrt(lambda)).
> > But need to implement carefully; rpois should always return a
> > non-negative integer, whereas rnorm always returns numeric...
> >
>
>Thanks for the reply.
>
>
>However, I think it's not acceptable to get an NA from a number
> that cannot be expressed as an integer.  Whenever a randomly generated
> number would exceed .Machine$integer.max, the choice is between
> returning NA or a non-integer numeric.  Consider:
>
>
>  > 2*.Machine$integer.max
> [1] 4294967294
>  > as.integer(2*.Machine$integer.max)
> [1] NA
> Warning message:
> NAs introduced by coercion to integer range
>
>
>I'd rather have the non-integer numeric.
>
>
>Spencer
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
-- 
Sent from Gmail Mobile

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] rpois(9, 1e10)

2020-01-19 Thread Benjamin Tyner
So imagine rpois is changed, such that the storage mode of its return 
value is sometimes integer and sometimes numeric. Then imagine the case 
where lambda is itself a realization of a random variable. Do we really 
want the storage mode to inherit that randomness?



On 1/19/20 10:47 AM, Avraham Adler wrote:

Maybe there should be code for 64 bit R to use long long or the like?

On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves 
mailto:spencer.gra...@prodsyse.com>> wrote:




On 2020-01-19 09:34, Benjamin Tyner wrote:
>>

>> Hello, All:
>>
>>
>>     Consider:
>>
>>
>> Browse[2]> set.seed(1)
>> Browse[2]> rpois(9, 1e10)
>> NAs produced[1] NA NA NA NA NA NA NA NA NA
>>
>>
>>     Should this happen?
>>
>>
>>     I think that for, say, lambda>1e6, rpois should return
rnorm(.,
>> lambda, sqrt(lambda)).
> But need to implement carefully; rpois should always return a
> non-negative integer, whereas rnorm always returns numeric...
>

   Thanks for the reply.


   However, I think it's not acceptable to get an NA from a
number
that cannot be expressed as an integer.  Whenever a randomly
generated
number would exceed .Machine$integer.max, the choice is between
returning NA or a non-integer numeric.  Consider:


 > 2*.Machine$integer.max
[1] 4294967294
 > as.integer(2*.Machine$integer.max)
[1] NA
Warning message:
NAs introduced by coercion to integer range


   I'd rather have the non-integer numeric.


   Spencer

__
R-devel@r-project.org  mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

--
Sent from Gmail Mobile


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] rpois(9, 1e10)

2020-01-19 Thread Avraham Adler
Technically, lambda can always be numeric. It is the observations which
must be integral.

Would hitting everything larger than maxint or maxlonglong with floor or
round fundamentally change the distribution? Well, yes, but enough that it
would matter over process risk?

Avi

On Sun, Jan 19, 2020 at 11:20 AM Benjamin Tyner  wrote:

> So imagine rpois is changed, such that the storage mode of its return
> value is sometimes integer and sometimes numeric. Then imagine the case
> where lambda is itself a realization of a random variable. Do we really
> want the storage mode to inherit that randomness?
>
>
> On 1/19/20 10:47 AM, Avraham Adler wrote:
> > Maybe there should be code for 64 bit R to use long long or the like?
> >
> > On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves
> > mailto:spencer.gra...@prodsyse.com>>
> wrote:
> >
> >
> >
> > On 2020-01-19 09:34, Benjamin Tyner wrote:
> > >>
> >
>  
> > >> Hello, All:
> > >>
> > >>
> > >> Consider:
> > >>
> > >>
> > >> Browse[2]> set.seed(1)
> > >> Browse[2]> rpois(9, 1e10)
> > >> NAs produced[1] NA NA NA NA NA NA NA NA NA
> > >>
> > >>
> > >> Should this happen?
> > >>
> > >>
> > >> I think that for, say, lambda>1e6, rpois should return
> > rnorm(.,
> > >> lambda, sqrt(lambda)).
> > > But need to implement carefully; rpois should always return a
> > > non-negative integer, whereas rnorm always returns numeric...
> > >
> >
> >Thanks for the reply.
> >
> >
> >However, I think it's not acceptable to get an NA from a
> > number
> > that cannot be expressed as an integer.  Whenever a randomly
> > generated
> > number would exceed .Machine$integer.max, the choice is between
> > returning NA or a non-integer numeric.  Consider:
> >
> >
> >  > 2*.Machine$integer.max
> > [1] 4294967294
> >  > as.integer(2*.Machine$integer.max)
> > [1] NA
> > Warning message:
> > NAs introduced by coercion to integer range
> >
> >
> >I'd rather have the non-integer numeric.
> >
> >
> >Spencer
> >
> > __
> > R-devel@r-project.org  mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > --
> > Sent from Gmail Mobile
>
-- 
Sent from Gmail Mobile

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] rpois(9, 1e10)

2020-01-19 Thread Spencer Graves
   This issue arose for me in simulations to estimate confidence, 
prediction, and tolerance intervals from glm(., family=poisson) fits 
embedded in a BMA::bic.glm fit using a simulate.bic.glm function I added 
to the development version of Ecfun, available at 
"https://github.com/sbgraves237/Ecfun".  This is part of a vignette I'm 
developing, available at 
"https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd";.
 
This includes a simulated mean of a mixture of Poissons that exceeds 
2e22.  It doesn't seem unreasonable to me to have rpois output a 
numerics rather than integers when a number simulated exceeds 
.Machine$integer.max.  And it does seem to make less sense in such cases 
to return NAs.


    Alternatively, might it make sense to add another argument to 
rpois to give the user the choice?  E.g., an argument "bigOutput" with 
(I hope) default = "numeric" and "NA" as a second option.  Or NA is the 
default, so no code that relied that feature of the current code would 
be broken by the change.  If someone wanted to use arbitrary precision 
arithmetic, they could write their own version of this function with 
"arbitraryPrecision" as an optional value for the "bigOutput" argument.


   Comments?
   Thanks,
   Spencer Graves


On 2020-01-19 10:28, Avraham Adler wrote:
> Technically, lambda can always be numeric. It is the observations 
> which must be integral.
>
> Would hitting everything larger than maxint or maxlonglong with floor 
> or round fundamentally change the distribution? Well, yes, but enough 
> that it would matter over process risk?
>
> Avi
>
> On Sun, Jan 19, 2020 at 11:20 AM Benjamin Tyner  > wrote:
>
> So imagine rpois is changed, such that the storage mode of its return
> value is sometimes integer and sometimes numeric. Then imagine the
> case
> where lambda is itself a realization of a random variable. Do we
> really
> want the storage mode to inherit that randomness?
>
>
> On 1/19/20 10:47 AM, Avraham Adler wrote:
> > Maybe there should be code for 64 bit R to use long long or the
> like?
> >
> > On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves
> >  
>  >> wrote:
> >
> >
> >
> >     On 2020-01-19 09:34, Benjamin Tyner wrote:
> >     >>
> >
>  
> >     >> Hello, All:
> >     >>
> >     >>
> >     >>     Consider:
> >     >>
> >     >>
> >     >> Browse[2]> set.seed(1)
> >     >> Browse[2]> rpois(9, 1e10)
> >     >> NAs produced[1] NA NA NA NA NA NA NA NA NA
> >     >>
> >     >>
> >     >>     Should this happen?
> >     >>
> >     >>
> >     >>     I think that for, say, lambda>1e6, rpois should
> return
> >     rnorm(.,
> >     >> lambda, sqrt(lambda)).
> >     > But need to implement carefully; rpois should always return a
> >     > non-negative integer, whereas rnorm always returns numeric...
> >     >
> >
> >        Thanks for the reply.
> >
> >
> >        However, I think it's not acceptable to get an NA from a
> >     number
> >     that cannot be expressed as an integer.  Whenever a randomly
> >     generated
> >     number would exceed .Machine$integer.max, the choice is between
> >     returning NA or a non-integer numeric.  Consider:
> >
> >
> >      > 2*.Machine$integer.max
> >     [1] 4294967294
> >      > as.integer(2*.Machine$integer.max)
> >     [1] NA
> >     Warning message:
> >     NAs introduced by coercion to integer range
> >
> >
> >        I'd rather have the non-integer numeric.
> >
> >
> >        Spencer
> >
> >     __
> > R-devel@r-project.org 
> >
> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > --
> > Sent from Gmail Mobile
>
> -- 
> Sent from Gmail Mobile


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] rpois(9, 1e10)

2020-01-19 Thread Avraham Adler
Crazy thought, but being that a sum of Poissons is Poisson in the sum, can
you break your “big” simulation into the sum of a few smaller ones? Or is
the order of magnitude difference just too great?

On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves 
wrote:

>   This issue arose for me in simulations to estimate confidence,
> prediction, and tolerance intervals from glm(., family=poisson) fits
> embedded in a BMA::bic.glm fit using a simulate.bic.glm function I added to
> the development version of Ecfun, available at
> "https://github.com/sbgraves237/Ecfun";
> .  This is part of a vignette I'm
> developing, available at
> "https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd";
> .
> This includes a simulated mean of a mixture of Poissons that exceeds 2e22.
> It doesn't seem unreasonable to me to have rpois output a numerics rather
> than integers when a number simulated exceeds .Machine$integer.max.  And it
> does seem to make less sense in such cases to return NAs.
>
>
>Alternatively, might it make sense to add another argument to rpois
> to give the user the choice?  E.g., an argument "bigOutput" with (I hope)
> default = "numeric" and "NA" as a second option.  Or NA is the default, so
> no code that relied that feature of the current code would be broken by the
> change.  If someone wanted to use arbitrary precision arithmetic, they
> could write their own version of this function with "arbitraryPrecision" as
> an optional value for the "bigOutput" argument.
>
>
>   Comments?
>   Thanks,
>   Spencer Graves
>
>
>
> On 2020-01-19 10:28, Avraham Adler wrote:
>
> Technically, lambda can always be numeric. It is the observations which
> must be integral.
>
> Would hitting everything larger than maxint or maxlonglong with floor or
> round fundamentally change the distribution? Well, yes, but enough that it
> would matter over process risk?
>
> Avi
>
> On Sun, Jan 19, 2020 at 11:20 AM Benjamin Tyner  wrote:
>
>> So imagine rpois is changed, such that the storage mode of its return
>> value is sometimes integer and sometimes numeric. Then imagine the case
>> where lambda is itself a realization of a random variable. Do we really
>> want the storage mode to inherit that randomness?
>>
>>
>> On 1/19/20 10:47 AM, Avraham Adler wrote:
>> > Maybe there should be code for 64 bit R to use long long or the like?
>> >
>> > On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves
>> > mailto:spencer.gra...@prodsyse.com>>
>> wrote:
>> >
>> >
>> >
>> > On 2020-01-19 09:34, Benjamin Tyner wrote:
>> > >>
>> >
>>  
>> > >> Hello, All:
>> > >>
>> > >>
>> > >> Consider:
>> > >>
>> > >>
>> > >> Browse[2]> set.seed(1)
>> > >> Browse[2]> rpois(9, 1e10)
>> > >> NAs produced[1] NA NA NA NA NA NA NA NA NA
>> > >>
>> > >>
>> > >> Should this happen?
>> > >>
>> > >>
>> > >> I think that for, say, lambda>1e6, rpois should return
>> > rnorm(.,
>> > >> lambda, sqrt(lambda)).
>> > > But need to implement carefully; rpois should always return a
>> > > non-negative integer, whereas rnorm always returns numeric...
>> > >
>> >
>> >Thanks for the reply.
>> >
>> >
>> >However, I think it's not acceptable to get an NA from a
>> > number
>> > that cannot be expressed as an integer.  Whenever a randomly
>> > generated
>> > number would exceed .Machine$integer.max, the choice is between
>> > returning NA or a non-integer numeric.  Consider:
>> >
>> >
>> >  > 2*.Machine$integer.max
>> > [1] 4294967294
>> >  > as.integer(2*.Machine$integer.max)
>> > [1] NA
>> > Warning message:
>> > NAs introduced by coercion to integer range
>> >
>> >
>> >I'd rather have the non-integer numeric.
>> >
>> >
>> >Spencer
>> >
>> > __
>> > R-devel@r-project.org  mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>> > --
>> > Sent from Gmail Mobile
>>
> --
> Sent from Gmail Mobile
>
>
> --
Sent from Gmail Mobile

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] rpois(9, 1e10)

2020-01-19 Thread Spencer Graves



On 2020-01-19 13:01, Avraham Adler wrote:
> Crazy thought, but being that a sum of Poissons is Poisson in the sum, 
> can you break your “big” simulation into the sum of a few smaller 
> ones? Or is the order of magnitude difference just too great?


   I don't perceive that as feasible.  Once I found what was 
generating NAs, it was easy to code a function to return pseudo-random 
numbers using the standard normal approximation to the Poisson for those 
extreme cases.  [For a Poisson with mean = 1e6, for example, the 
skewness (third standardized moment) is 0.001.  At least for my 
purposes, that should be adequate.][1]


   What are the negative consequences of having rpois return 
numerics that are always nonnegative?


   Spencer


[1]  In the code I reported before, I just changed the threshold of 1e6 
to 0.5*.Machine$integer.max.  On my Mac, .Machine$integer.max = 
2147483647 = 2^31 > 1e9.  That still means that a Poisson distributed 
pseudo-random number just under that would have to be over 23000 
standard deviations above the mean to exceed .Machine$integer.max.

>
> On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves 
> mailto:spencer.gra...@prodsyse.com>> wrote:
>
>   This issue arose for me in simulations to estimate
> confidence, prediction, and tolerance intervals from glm(.,
> family=poisson) fits embedded in a BMA::bic.glm fit using a
> simulate.bic.glm function I added to the development version of
> Ecfun, available at "https://github.com/sbgraves237/Ecfun";
> . This is part of a vignette
> I'm developing, available at
> 
> "https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd";
> 
> .
> This includes a simulated mean of a mixture of Poissons that
> exceeds 2e22.  It doesn't seem unreasonable to me to have rpois
> output a numerics rather than integers when a number simulated
> exceeds .Machine$integer.max.  And it does seem to make less sense
> in such cases to return NAs.
>
>
>    Alternatively, might it make sense to add another argument
> to rpois to give the user the choice?  E.g., an argument
> "bigOutput" with (I hope) default = "numeric" and "NA" as a second
> option.  Or NA is the default, so no code that relied that feature
> of the current code would be broken by the change.  If someone
> wanted to use arbitrary precision arithmetic, they could write
> their own version of this function with "arbitraryPrecision" as an
> optional value for the "bigOutput" argument.
>
>
>   Comments?
>   Thanks,
>   Spencer Graves
>
>
>
> On 2020-01-19 10:28, Avraham Adler wrote:
>> Technically, lambda can always be numeric. It is the observations
>> which must be integral.
>>
>> Would hitting everything larger than maxint or maxlonglong with
>> floor or round fundamentally change the distribution? Well, yes,
>> but enough that it would matter over process risk?
>>
>> Avi
>>
>> On Sun, Jan 19, 2020 at 11:20 AM Benjamin Tyner > > wrote:
>>
>> So imagine rpois is changed, such that the storage mode of
>> its return
>> value is sometimes integer and sometimes numeric. Then
>> imagine the case
>> where lambda is itself a realization of a random variable. Do
>> we really
>> want the storage mode to inherit that randomness?
>>
>>
>> On 1/19/20 10:47 AM, Avraham Adler wrote:
>> > Maybe there should be code for 64 bit R to use long long or
>> the like?
>> >
>> > On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves
>> > > 
>> > >> wrote:
>> >
>> >
>> >
>> >     On 2020-01-19 09:34, Benjamin Tyner wrote:
>> >     >>
>> >
>>  
>> 
>> >     >> Hello, All:
>> >     >>
>> >     >>
>> >     >>     Consider:
>> >     >>
>> >     >>
>> >     >> Browse[2]> set.seed(1)
>> >     >> Browse[2]> rpois(9, 1e10)
>> >     >> NAs produced[1] NA NA NA NA NA NA NA NA NA
>> >     >>
>> >     >>
>> >     >>     Should this happen?
>> >     >>
>> >     >>
>> >     >>     I think that for, say, lambda>1e6, rpois
>> should return
>> >     rnorm(.,
>> >     >> lambda, sqrt(lambda)).
>> >     > But need to implement carefully; rpois should always
>> return a
>> >     > non-negative integer, whereas rnorm always returns
>> numeric...
>>   

Re: [Rd] rpois(9, 1e10)

2020-01-19 Thread Avraham Adler
Floor (maybe round) of non-negative numerics, though. Poisson should never
have anything after decimal.

Still think it’s worth allowing long long for R64 bit, just for purity
sake.

Avi

On Sun, Jan 19, 2020 at 4:38 PM Spencer Graves 
wrote:

>
>
> On 2020-01-19 13:01, Avraham Adler wrote:
>
> Crazy thought, but being that a sum of Poissons is Poisson in the sum, can
> you break your “big” simulation into the sum of a few smaller ones? Or is
> the order of magnitude difference just too great?
>
>
>
>   I don't perceive that as feasible.  Once I found what was generating
> NAs, it was easy to code a function to return pseudo-random numbers using
> the standard normal approximation to the Poisson for those extreme cases.
> [For a Poisson with mean = 1e6, for example, the skewness (third
> standardized moment) is 0.001.  At least for my purposes, that should be
> adequate.][1]
>
>
>   What are the negative consequences of having rpois return numerics
> that are always nonnegative?
>
>
>   Spencer
>
>
> [1]  In the code I reported before, I just changed the threshold of 1e6 to
> 0.5*.Machine$integer.max.  On my Mac, .Machine$integer.max = 2147483647 =
> 2^31 > 1e9.  That still means that a Poisson distributed pseudo-random
> number just under that would have to be over 23000 standard deviations
> above the mean to exceed .Machine$integer.max.
>
>
> On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves <
> spencer.gra...@prodsyse.com> wrote:
>
>>   This issue arose for me in simulations to estimate confidence,
>> prediction, and tolerance intervals from glm(., family=poisson) fits
>> embedded in a BMA::bic.glm fit using a simulate.bic.glm function I added to
>> the development version of Ecfun, available at
>> "https://github.com/sbgraves237/Ecfun";
>> .  This is part of a vignette I'm
>> developing, available at
>> "https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd";
>> .
>> This includes a simulated mean of a mixture of Poissons that exceeds 2e22.
>> It doesn't seem unreasonable to me to have rpois output a numerics rather
>> than integers when a number simulated exceeds .Machine$integer.max.  And it
>> does seem to make less sense in such cases to return NAs.
>>
>>
>>Alternatively, might it make sense to add another argument to
>> rpois to give the user the choice?  E.g., an argument "bigOutput" with (I
>> hope) default = "numeric" and "NA" as a second option.  Or NA is the
>> default, so no code that relied that feature of the current code would be
>> broken by the change.  If someone wanted to use arbitrary precision
>> arithmetic, they could write their own version of this function with
>> "arbitraryPrecision" as an optional value for the "bigOutput" argument.
>>
>>
>>   Comments?
>>   Thanks,
>>   Spencer Graves
>>
>>
>>
>> On 2020-01-19 10:28, Avraham Adler wrote:
>>
>> Technically, lambda can always be numeric. It is the observations which
>> must be integral.
>>
>> Would hitting everything larger than maxint or maxlonglong with floor or
>> round fundamentally change the distribution? Well, yes, but enough that it
>> would matter over process risk?
>>
>> Avi
>>
>> On Sun, Jan 19, 2020 at 11:20 AM Benjamin Tyner  wrote:
>>
>>> So imagine rpois is changed, such that the storage mode of its return
>>> value is sometimes integer and sometimes numeric. Then imagine the case
>>> where lambda is itself a realization of a random variable. Do we really
>>> want the storage mode to inherit that randomness?
>>>
>>>
>>> On 1/19/20 10:47 AM, Avraham Adler wrote:
>>> > Maybe there should be code for 64 bit R to use long long or the like?
>>> >
>>> > On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves
>>> > mailto:spencer.gra...@prodsyse.com>>
>>> wrote:
>>> >
>>> >
>>> >
>>> > On 2020-01-19 09:34, Benjamin Tyner wrote:
>>> > >>
>>> >
>>>  
>>> > >> Hello, All:
>>> > >>
>>> > >>
>>> > >> Consider:
>>> > >>
>>> > >>
>>> > >> Browse[2]> set.seed(1)
>>> > >> Browse[2]> rpois(9, 1e10)
>>> > >> NAs produced[1] NA NA NA NA NA NA NA NA NA
>>> > >>
>>> > >>
>>> > >> Should this happen?
>>> > >>
>>> > >>
>>> > >> I think that for, say, lambda>1e6, rpois should return
>>> > rnorm(.,
>>> > >> lambda, sqrt(lambda)).
>>> > > But need to implement carefully; rpois should always return a
>>> > > non-negative integer, whereas rnorm always returns numeric...
>>> > >
>>> >
>>> >Thanks for the reply.
>>> >
>>> >
>>> >However, I think it's not acceptable to get an NA from a
>>> > number
>>> > that cannot be expressed as an integer.  Whenever a randomly
>>> > generated
>>> > number would exceed .Machine$integer.max, the choi

Re: [Rd] rpois(9, 1e10)

2020-01-19 Thread Spencer Graves
On my Mac:


str(.Machine)
...
$ integer.max  : int 2147483647
  $ sizeof.long  : int 8
  $ sizeof.longlong  : int 8
  $ sizeof.longdouble    : int 16
  $ sizeof.pointer   : int 8


   On a Windows 10 machine I have, $ sizeof.long : int 4; otherwise 
the same as on my Mac.


   Am I correct that $ sizeof.long = 4 means 4 bytes = 32 bits? 
log2(.Machine$integer.max) = 31.  Then 8 bytes is what used to be called 
double precision (2 words of 4 bytes each)?  And $ sizeof.longdouble = 
16 = 4 words of 4 bytes each?


   Spencer


On 2020-01-19 15:41, Avraham Adler wrote:
> Floor (maybe round) of non-negative numerics, though. Poisson should 
> never have anything after decimal.
>
> Still think it’s worth allowing long long for R64 bit, just for purity 
> sake.
>
> Avi
>
> On Sun, Jan 19, 2020 at 4:38 PM Spencer Graves 
> mailto:spencer.gra...@prodsyse.com>> wrote:
>
>
>
> On 2020-01-19 13:01, Avraham Adler wrote:
>> Crazy thought, but being that a sum of Poissons is Poisson in the
>> sum, can you break your “big” simulation into the sum of a few
>> smaller ones? Or is the order of magnitude difference just too great?
>
>
>   I don't perceive that as feasible.  Once I found what was
> generating NAs, it was easy to code a function to return
> pseudo-random numbers using the standard normal approximation to
> the Poisson for those extreme cases.  [For a Poisson with mean =
> 1e6, for example, the skewness (third standardized moment) is
> 0.001.  At least for my purposes, that should be adequate.][1]
>
>
>   What are the negative consequences of having rpois return
> numerics that are always nonnegative?
>
>
>   Spencer
>
>
> [1]  In the code I reported before, I just changed the threshold
> of 1e6 to 0.5*.Machine$integer.max.  On my Mac,
> .Machine$integer.max = 2147483647 = 2^31 > 1e9. That still means
> that a Poisson distributed pseudo-random number just under that
> would have to be over 23000 standard deviations above the mean to
> exceed .Machine$integer.max.
>
>>
>> On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves
>> > > wrote:
>>
>>   This issue arose for me in simulations to estimate
>> confidence, prediction, and tolerance intervals from glm(.,
>> family=poisson) fits embedded in a BMA::bic.glm fit using a
>> simulate.bic.glm function I added to the development version
>> of Ecfun, available at "https://github.com/sbgraves237/Ecfun";
>> . This is part of a
>> vignette I'm developing, available at
>> 
>> "https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd";
>> 
>> .
>> This includes a simulated mean of a mixture of Poissons that
>> exceeds 2e22.  It doesn't seem unreasonable to me to have
>> rpois output a numerics rather than integers when a number
>> simulated exceeds .Machine$integer.max.  And it does seem to
>> make less sense in such cases to return NAs.
>>
>>
>>    Alternatively, might it make sense to add another
>> argument to rpois to give the user the choice?  E.g., an
>> argument "bigOutput" with (I hope) default = "numeric" and
>> "NA" as a second option.  Or NA is the default, so no code
>> that relied that feature of the current code would be broken
>> by the change.  If someone wanted to use arbitrary precision
>> arithmetic, they could write their own version of this
>> function with "arbitraryPrecision" as an optional value for
>> the "bigOutput" argument.
>>
>>
>>   Comments?
>>   Thanks,
>>   Spencer Graves
>>
>>
>>
>> On 2020-01-19 10:28, Avraham Adler wrote:
>>> Technically, lambda can always be numeric. It is the
>>> observations which must be integral.
>>>
>>> Would hitting everything larger than maxint or maxlonglong
>>> with floor or round fundamentally change the distribution?
>>> Well, yes, but enough that it would matter over process risk?
>>>
>>> Avi
>>>
>>> On Sun, Jan 19, 2020 at 11:20 AM Benjamin Tyner
>>> mailto:bty...@gmail.com>> wrote:
>>>
>>> So imagine rpois is changed, such that the storage mode
>>> of its return
>>> value is sometimes integer and sometimes numeric. Then
>>> imagine the case
>>> where lambda is itself a realization of a random
>>> variable. Do we really
>>> want the storage mode to inherit that randomness?
>>>
>>>
>>> On 1/19/20 10:47 AM, Avraham Adler wrote:
>>> > Maybe there should be code for 64 bit R to use long

Re: [Rd] [External] Re: rpois(9, 1e10)

2020-01-19 Thread Tierney, Luke
R uses the C 'int' type for its integer data and that is pretty much
universally 32 bit these days. In fact R wont' compile if it is not.
That means the range for integer data is the integers in [-2^31,
+2^31).

It would be good to allow for a larger integer range for R integer
objects, and several of us are thinking about how me might get there.
But it isn't easy to get right, so it may take some time. I doubt
anything can happen for R 4.0.0 this year, but 2021 may be possible.

I few notes inline below:

On Sun, 19 Jan 2020, Spencer Graves wrote:

> On my Mac:
>
>
> str(.Machine)
> ...
> $ integer.max  : int 2147483647
>  $ sizeof.long  : int 8
>  $ sizeof.longlong  : int 8
>  $ sizeof.longdouble    : int 16
>  $ sizeof.pointer   : int 8
>
>
>   On a Windows 10 machine I have, $ sizeof.long : int 4; otherwise
> the same as on my Mac.

One of many annoyances of Windows -- done for compatibility with
ancient Window apps.

>   Am I correct that $ sizeof.long = 4 means 4 bytes = 32 bits?
> log2(.Machine$integer.max) = 31.  Then 8 bytes is what used to be called
> double precision (2 words of 4 bytes each)?  And $ sizeof.longdouble =
> 16 = 4 words of 4 bytes each?

double precision is a floating point concept, not related to integers.

If you want to figure out whether you are running a 32 bit or 64 bit R
look at sizeof.pointer -- 4 means 32 bits, 8 64 bits.

Best,

luke


>
>
>   Spencer
>
>
> On 2020-01-19 15:41, Avraham Adler wrote:
>> Floor (maybe round) of non-negative numerics, though. Poisson should
>> never have anything after decimal.
>>
>> Still think it’s worth allowing long long for R64 bit, just for purity
>> sake.
>>
>> Avi
>>
>> On Sun, Jan 19, 2020 at 4:38 PM Spencer Graves
>> mailto:spencer.gra...@prodsyse.com>> wrote:
>>
>>
>>
>> On 2020-01-19 13:01, Avraham Adler wrote:
>>> Crazy thought, but being that a sum of Poissons is Poisson in the
>>> sum, can you break your “big” simulation into the sum of a few
>>> smaller ones? Or is the order of magnitude difference just too great?
>>
>>
>>   I don't perceive that as feasible.  Once I found what was
>> generating NAs, it was easy to code a function to return
>> pseudo-random numbers using the standard normal approximation to
>> the Poisson for those extreme cases.  [For a Poisson with mean =
>> 1e6, for example, the skewness (third standardized moment) is
>> 0.001.  At least for my purposes, that should be adequate.][1]
>>
>>
>>   What are the negative consequences of having rpois return
>> numerics that are always nonnegative?
>>
>>
>>   Spencer
>>
>>
>> [1]  In the code I reported before, I just changed the threshold
>> of 1e6 to 0.5*.Machine$integer.max.  On my Mac,
>> .Machine$integer.max = 2147483647 = 2^31 > 1e9. That still means
>> that a Poisson distributed pseudo-random number just under that
>> would have to be over 23000 standard deviations above the mean to
>> exceed .Machine$integer.max.
>>
>>>
>>> On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves
>>> >> > wrote:
>>>
>>>   This issue arose for me in simulations to estimate
>>> confidence, prediction, and tolerance intervals from glm(.,
>>> family=poisson) fits embedded in a BMA::bic.glm fit using a
>>> simulate.bic.glm function I added to the development version
>>> of Ecfun, available at "https://github.com/sbgraves237/Ecfun";
>>> . This is part of a
>>> vignette I'm developing, available at
>>> 
>>> "https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd";
>>> 
>>> .
>>> This includes a simulated mean of a mixture of Poissons that
>>> exceeds 2e22.  It doesn't seem unreasonable to me to have
>>> rpois output a numerics rather than integers when a number
>>> simulated exceeds .Machine$integer.max.  And it does seem to
>>> make less sense in such cases to return NAs.
>>>
>>>
>>>    Alternatively, might it make sense to add another
>>> argument to rpois to give the user the choice?  E.g., an
>>> argument "bigOutput" with (I hope) default = "numeric" and
>>> "NA" as a second option.  Or NA is the default, so no code
>>> that relied that feature of the current code would be broken
>>> by the change.  If someone wanted to use arbitrary precision
>>> arithmetic, they could write their own version of this
>>> function with "arbitraryPrecision" as an optional value for
>>> the "bigOutput" argument.
>>>
>>>
>>>   Comments?
>>>   Thanks,
>>>   Spencer Graves
>>>
>>>
>>>
>>> On 2020-01-19 10:28, Avraham Adler wrote:
 Te

Re: [Rd] [External] Re: rpois(9, 1e10)

2020-01-19 Thread Spencer Graves
Thanks to Luke and Avi for their comments.  I wrapped "round" around the 
call to "rnorm" inside my "rpois.".  For "lambda" really big, that 
"round" won't do anything.  However, it appears to give integers in 
floating point representation that are larger than 
.Machine$integer.max.  That sounds very much like what someone would 
want.  Spencer



On 2020-01-19 21:00, Tierney, Luke wrote:

R uses the C 'int' type for its integer data and that is pretty much
universally 32 bit these days. In fact R wont' compile if it is not.
That means the range for integer data is the integers in [-2^31,
+2^31).

It would be good to allow for a larger integer range for R integer
objects, and several of us are thinking about how me might get there.
But it isn't easy to get right, so it may take some time. I doubt
anything can happen for R 4.0.0 this year, but 2021 may be possible.

I few notes inline below:

On Sun, 19 Jan 2020, Spencer Graves wrote:


On my Mac:


str(.Machine)
...
$ integer.max  : int 2147483647
  $ sizeof.long  : int 8
  $ sizeof.longlong  : int 8
  $ sizeof.longdouble    : int 16
  $ sizeof.pointer   : int 8


   On a Windows 10 machine I have, $ sizeof.long : int 4; otherwise
the same as on my Mac.

One of many annoyances of Windows -- done for compatibility with
ancient Window apps.


   Am I correct that $ sizeof.long = 4 means 4 bytes = 32 bits?
log2(.Machine$integer.max) = 31.  Then 8 bytes is what used to be called
double precision (2 words of 4 bytes each)?  And $ sizeof.longdouble =
16 = 4 words of 4 bytes each?

double precision is a floating point concept, not related to integers.

If you want to figure out whether you are running a 32 bit or 64 bit R
look at sizeof.pointer -- 4 means 32 bits, 8 64 bits.

Best,

luke




   Spencer


On 2020-01-19 15:41, Avraham Adler wrote:

Floor (maybe round) of non-negative numerics, though. Poisson should
never have anything after decimal.

Still think it’s worth allowing long long for R64 bit, just for purity
sake.

Avi

On Sun, Jan 19, 2020 at 4:38 PM Spencer Graves
mailto:spencer.gra...@prodsyse.com>> wrote:



 On 2020-01-19 13:01, Avraham Adler wrote:

 Crazy thought, but being that a sum of Poissons is Poisson in the
 sum, can you break your “big” simulation into the sum of a few
 smaller ones? Or is the order of magnitude difference just too great?


   I don't perceive that as feasible.  Once I found what was
 generating NAs, it was easy to code a function to return
 pseudo-random numbers using the standard normal approximation to
 the Poisson for those extreme cases.  [For a Poisson with mean =
 1e6, for example, the skewness (third standardized moment) is
 0.001.  At least for my purposes, that should be adequate.][1]


   What are the negative consequences of having rpois return
 numerics that are always nonnegative?


   Spencer


 [1]  In the code I reported before, I just changed the threshold
 of 1e6 to 0.5*.Machine$integer.max.  On my Mac,
 .Machine$integer.max = 2147483647 = 2^31 > 1e9. That still means
 that a Poisson distributed pseudo-random number just under that
 would have to be over 23000 standard deviations above the mean to
 exceed .Machine$integer.max.


 On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves
 mailto:spencer.gra...@prodsyse.com>> wrote:

   This issue arose for me in simulations to estimate
 confidence, prediction, and tolerance intervals from glm(.,
 family=poisson) fits embedded in a BMA::bic.glm fit using a
 simulate.bic.glm function I added to the development version
 of Ecfun, available at "https://github.com/sbgraves237/Ecfun";
 . This is part of a
 vignette I'm developing, available at
 
"https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd";
 
.
 This includes a simulated mean of a mixture of Poissons that
 exceeds 2e22.  It doesn't seem unreasonable to me to have
 rpois output a numerics rather than integers when a number
 simulated exceeds .Machine$integer.max.  And it does seem to
 make less sense in such cases to return NAs.


    Alternatively, might it make sense to add another
 argument to rpois to give the user the choice?  E.g., an
 argument "bigOutput" with (I hope) default = "numeric" and
 "NA" as a second option.  Or NA is the default, so no code
 that relied that feature of the current code would be broken
 by the change.  If someone wanted to use arbitrary precision
 arithmetic, they could write their own version of this
 function with "arbitraryPrecision" as an optional value for
 the "bigOutput" argument.