Re: [Rd] [R] choose(n, k) as n approaches k

Peter Dalgaard Fri, 17 Jan 2020 02:36:48 -0800


More importantly, the gamma function is not nice for negative integer 
arguments, so gchoose() blows up for integer k and n=-1,-2,...


> gchoose(-2,4)
[1] NaN
Warning messages:
1: In gamma(n + 1) : NaNs produced
2: In gamma(n + 1 - k) : NaNs produced
> choose(-2,4)
[1] 5

and so does the formula with beta(). 

Of course, choose(-r, k) is exactly what you need for the traditional 
formulation of the negative binomial distribution as the number of successes to 
get r failures. 

-pd

> On 15 Jan 2020, at 01:25 , peter dalgaard <[email protected]> wrote:
> 
> That crossed my mind too, but presumably someone designed choose() to handle 
> the near-integer cases specially. Otherwise, we already have beta() -- you 
> just need to remember what the connection is ;-). 
> 
> I would expect that it has to do with the binomial and negative binomial 
> distributions, but I can't offhand picture a calculation that leads to 
> integer k, n plus/minus a tiny numerical error of the sort that one may 
> encounter with, say, seq(). 
> 
> -pd
> 
> ;-) choose(a,b) = 1/(beta(a-b+1,b+1)*(a+1)) or thereabouts
> 
>> On 14 Jan 2020, at 19:36 , John Mount <[email protected]> wrote:
>> 
>> 
>> At the risk of throwing oil on a fire.  If we are talking about fractional 
>> values of choose() doesn't it make sense to look to the gamma function for 
>> the correct analytic continuation?  In particular k<0 may not imply the 
>> function should evaluate to zero until we get k<=-1.
>> 
>> Example:
>> 
>> ``` r
>> choose(5, 4)
>> #> [1] 5
>> 
>> gchoose <- function(n, k) { 
>>  gamma(n+1)/(gamma(n+1-k) * gamma(k+1))
>> }
>> 
>> gchoose(5, 4)
>> #> [1] 5
>> gchoose(5, 0)
>> #> [1] 1
>> gchoose(5, -0.5)
>> #> [1] 0.2351727
>> ```
>> 
>>> On Jan 14, 2020, at 10:20 AM, peter dalgaard <[email protected]> wrote:
>>> 
>>> OK, I see what you mean. But in those cases, we don't get the catastrophic 
>>> failures from the 
>>> 
>>>       if (k <  0) return 0.;
>>>       if (k == 0) return 1.;
>>>       /* else: k >= 1 */
>>> 
>>> part, because at that point k is sure to be integer, possibly after 
>>> rounding. 
>>> 
>>> It is when n-k is approximately but not exactly zero and we should return 
>>> 1, that we either return 0 (negative case) or n (positive case; because the 
>>> n(n-1)(n-2)... product has at least one factor). In the other cases, we get 
>>> 1 or n(n-1)(n-2)...(n-k+1) which if n is near-integer gets rounded to 
>>> produce an integer, due to the
>>> 
>>>       return R_IS_INT(n) ? R_forceint(r) : r;
>>> 
>>> part.
>>> 
>>> -pd
>>> 
>>> 
>>> 
>>>> On 14 Jan 2020, at 17:02 , Duncan Murdoch <[email protected]> wrote:
>>>> 
>>>> On 14/01/2020 10:50 a.m., peter dalgaard wrote:
>>>>>> On 14 Jan 2020, at 16:21 , Duncan Murdoch <[email protected]> 
>>>>>> wrote:
>>>>>> 
>>>>>> On 14/01/2020 10:07 a.m., peter dalgaard wrote:
>>>>>>> Yep, that looks wrong (probably want to continue discussion over on 
>>>>>>> R-devel)
>>>>>>> I think the culprit is here (in src/nmath/choose.c)
>>>>>>>    if (k < k_small_max) {
>>>>>>>       int j;
>>>>>>>       if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */
>>>>>>>       if (k <  0) return 0.;
>>>>>>>       if (k == 0) return 1.;
>>>>>>>       /* else: k >= 1 */
>>>>>>> if n is a near-integer, then k can become non-integer and negative. In 
>>>>>>> your case,
>>>>>>> n == 4 - 1e-7
>>>>>>> k == 4
>>>>>>> n - k == -1e-7 < 4
>>>>>>> n >= 0
>>>>>>> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed)
>>>>>>> so k gets set to
>>>>>>> n - k == -1e-7
>>>>>>> which is less than 0, so we return 0. However, as you point out, 1 
>>>>>>> would be more reasonable and in accordance with the limit as n -> 4, 
>>>>>>> e.g.
>>>>>>>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1
>>>>>>> [1] -9.289025e-11
>>>>>>> I guess that the fix could be as simple as replacing n by R_forceint(n) 
>>>>>>> in the k = n - k step.
>>>>>> 
>>>>>> I think that would break symmetry:  you want choose(n, k) to equal 
>>>>>> choose(n, n-k) when n is very close to an integer.  So I'd suggest the 
>>>>>> replacement whenever R_IS_INT(n) is true.
>>>>>> 
>>>>> But choose() very deliberately ensures that k is integer, so choose(n, 
>>>>> n-k) is ill-defined for non-integer n.
>>>> 
>>>> That's only true if there's a big difference.  I'd be worried about cases 
>>>> where n and k are close to integers (within 1e-7).  In those cases, k is 
>>>> silently rounded to integer.  As I read your suggestion, n would only be 
>>>> rounded to integer if k > n-k.  I think both n and k should be rounded to 
>>>> integer in this near-integer situation, regardless of the value of k.
>>>> 
>>>> I believe that lchoose(n, k) already does this.
>>>> 
>>>> Duncan Murdoch
>>>> 
>>>>>   double r, k0 = k;
>>>>>   k = R_forceint(k);
>>>>> ...
>>>>>   if (fabs(k - k0) > 1e-7)
>>>>>       MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), 
>>>>> k0, k);
>>>>> 
>>>>>> Duncan Murdoch
>>>>>> 
>>>>>>> -pd
>>>>>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <[email protected]> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> This struck me as incorrect:
>>>>>>>> 
>>>>>>>>> choose(3.999999, 4)
>>>>>>>> [1] 0.9999979
>>>>>>>>> choose(3.9999999, 4)
>>>>>>>> [1] 0
>>>>>>>>> choose(4, 4)
>>>>>>>> [1] 1
>>>>>>>>> choose(4.0000001, 4)
>>>>>>>> [1] 4
>>>>>>>>> choose(4.000001, 4)
>>>>>>>> [1] 1.000002
>>>>>>>> 
>>>>>>>> Should base::choose(n, k) check whether n is within machine precision 
>>>>>>>> of k and return 1?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Erik
>>>>>>>> 
>>>>>>>> ***
>>>>>>>> sessionInfo()
>>>>>>>> R version 3.6.0 beta (2019-04-15 r76395)
>>>>>>>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>>>>>>>> Running under: macOS High Sierra 10.13.6
>>>>>>>> 
>>>>>>>>        [[alternative HTML version deleted]]
>>>>>>>> 
>>>>>>>> ______________________________________________
>>>>>>>> [email protected] mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>> PLEASE do read the posting guide 
>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>>> -- 
>>> Peter Dalgaard, Professor,
>>> Center for Statistics, Copenhagen Business School
>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>>> Phone: (+45)38153501
>>> Office: A 4.23
>>> Email: [email protected]  Priv: [email protected]
>>> 
>>> ______________________________________________
>>> [email protected] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> ---------------
>> John Mount
>> http://www.win-vector.com/ 
>> Our book: Practical Data Science with R
>> http://practicaldatascience.com 
>> 
>> 
>> 
>> 
>> 
> 
> -- 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: [email protected]  Priv: [email protected]
> 
> 
> 
> 
> 
> 
> 
> 
> 

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [email protected]  Priv: [email protected]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [R] choose(n, k) as n approaches k

Reply via email to