More importantly, the gamma function is not nice for negative integer arguments, so gchoose() blows up for integer k and n=-1,-2,...
> gchoose(-2,4) [1] NaN Warning messages: 1: In gamma(n + 1) : NaNs produced 2: In gamma(n + 1 - k) : NaNs produced > choose(-2,4) [1] 5 and so does the formula with beta(). Of course, choose(-r, k) is exactly what you need for the traditional formulation of the negative binomial distribution as the number of successes to get r failures. -pd > On 15 Jan 2020, at 01:25 , peter dalgaard <pda...@gmail.com> wrote: > > That crossed my mind too, but presumably someone designed choose() to handle > the near-integer cases specially. Otherwise, we already have beta() -- you > just need to remember what the connection is ;-). > > I would expect that it has to do with the binomial and negative binomial > distributions, but I can't offhand picture a calculation that leads to > integer k, n plus/minus a tiny numerical error of the sort that one may > encounter with, say, seq(). > > -pd > > ;-) choose(a,b) = 1/(beta(a-b+1,b+1)*(a+1)) or thereabouts > >> On 14 Jan 2020, at 19:36 , John Mount <jmo...@win-vector.com> wrote: >> >> >> At the risk of throwing oil on a fire. If we are talking about fractional >> values of choose() doesn't it make sense to look to the gamma function for >> the correct analytic continuation? In particular k<0 may not imply the >> function should evaluate to zero until we get k<=-1. >> >> Example: >> >> ``` r >> choose(5, 4) >> #> [1] 5 >> >> gchoose <- function(n, k) { >> gamma(n+1)/(gamma(n+1-k) * gamma(k+1)) >> } >> >> gchoose(5, 4) >> #> [1] 5 >> gchoose(5, 0) >> #> [1] 1 >> gchoose(5, -0.5) >> #> [1] 0.2351727 >> ``` >> >>> On Jan 14, 2020, at 10:20 AM, peter dalgaard <pda...@gmail.com> wrote: >>> >>> OK, I see what you mean. But in those cases, we don't get the catastrophic >>> failures from the >>> >>> if (k < 0) return 0.; >>> if (k == 0) return 1.; >>> /* else: k >= 1 */ >>> >>> part, because at that point k is sure to be integer, possibly after >>> rounding. >>> >>> It is when n-k is approximately but not exactly zero and we should return >>> 1, that we either return 0 (negative case) or n (positive case; because the >>> n(n-1)(n-2)... product has at least one factor). In the other cases, we get >>> 1 or n(n-1)(n-2)...(n-k+1) which if n is near-integer gets rounded to >>> produce an integer, due to the >>> >>> return R_IS_INT(n) ? R_forceint(r) : r; >>> >>> part. >>> >>> -pd >>> >>> >>> >>>> On 14 Jan 2020, at 17:02 , Duncan Murdoch <murdoch.dun...@gmail.com> wrote: >>>> >>>> On 14/01/2020 10:50 a.m., peter dalgaard wrote: >>>>>> On 14 Jan 2020, at 16:21 , Duncan Murdoch <murdoch.dun...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> On 14/01/2020 10:07 a.m., peter dalgaard wrote: >>>>>>> Yep, that looks wrong (probably want to continue discussion over on >>>>>>> R-devel) >>>>>>> I think the culprit is here (in src/nmath/choose.c) >>>>>>> if (k < k_small_max) { >>>>>>> int j; >>>>>>> if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */ >>>>>>> if (k < 0) return 0.; >>>>>>> if (k == 0) return 1.; >>>>>>> /* else: k >= 1 */ >>>>>>> if n is a near-integer, then k can become non-integer and negative. In >>>>>>> your case, >>>>>>> n == 4 - 1e-7 >>>>>>> k == 4 >>>>>>> n - k == -1e-7 < 4 >>>>>>> n >= 0 >>>>>>> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed) >>>>>>> so k gets set to >>>>>>> n - k == -1e-7 >>>>>>> which is less than 0, so we return 0. However, as you point out, 1 >>>>>>> would be more reasonable and in accordance with the limit as n -> 4, >>>>>>> e.g. >>>>>>>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1 >>>>>>> [1] -9.289025e-11 >>>>>>> I guess that the fix could be as simple as replacing n by R_forceint(n) >>>>>>> in the k = n - k step. >>>>>> >>>>>> I think that would break symmetry: you want choose(n, k) to equal >>>>>> choose(n, n-k) when n is very close to an integer. So I'd suggest the >>>>>> replacement whenever R_IS_INT(n) is true. >>>>>> >>>>> But choose() very deliberately ensures that k is integer, so choose(n, >>>>> n-k) is ill-defined for non-integer n. >>>> >>>> That's only true if there's a big difference. I'd be worried about cases >>>> where n and k are close to integers (within 1e-7). In those cases, k is >>>> silently rounded to integer. As I read your suggestion, n would only be >>>> rounded to integer if k > n-k. I think both n and k should be rounded to >>>> integer in this near-integer situation, regardless of the value of k. >>>> >>>> I believe that lchoose(n, k) already does this. >>>> >>>> Duncan Murdoch >>>> >>>>> double r, k0 = k; >>>>> k = R_forceint(k); >>>>> ... >>>>> if (fabs(k - k0) > 1e-7) >>>>> MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), >>>>> k0, k); >>>>> >>>>>> Duncan Murdoch >>>>>> >>>>>>> -pd >>>>>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <eswri...@pitt.edu> >>>>>>>> wrote: >>>>>>>> >>>>>>>> This struck me as incorrect: >>>>>>>> >>>>>>>>> choose(3.999999, 4) >>>>>>>> [1] 0.9999979 >>>>>>>>> choose(3.9999999, 4) >>>>>>>> [1] 0 >>>>>>>>> choose(4, 4) >>>>>>>> [1] 1 >>>>>>>>> choose(4.0000001, 4) >>>>>>>> [1] 4 >>>>>>>>> choose(4.000001, 4) >>>>>>>> [1] 1.000002 >>>>>>>> >>>>>>>> Should base::choose(n, k) check whether n is within machine precision >>>>>>>> of k and return 1? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Erik >>>>>>>> >>>>>>>> *** >>>>>>>> sessionInfo() >>>>>>>> R version 3.6.0 beta (2019-04-15 r76395) >>>>>>>> Platform: x86_64-apple-darwin15.6.0 (64-bit) >>>>>>>> Running under: macOS High Sierra 10.13.6 >>>>>>>> >>>>>>>> [[alternative HTML version deleted]] >>>>>>>> >>>>>>>> ______________________________________________ >>>>>>>> r-h...@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>>> PLEASE do read the posting guide >>>>>>>> http://www.R-project.org/posting-guide.html >>>>>>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> -- >>> Peter Dalgaard, Professor, >>> Center for Statistics, Copenhagen Business School >>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >>> Phone: (+45)38153501 >>> Office: A 4.23 >>> Email: pd....@cbs.dk Priv: pda...@gmail.com >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> --------------- >> John Mount >> http://www.win-vector.com/ >> Our book: Practical Data Science with R >> http://practicaldatascience.com >> >> >> >> >> > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd....@cbs.dk Priv: pda...@gmail.com > > > > > > > > > -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel