{I've changed the subject; it's really no longer about lchoose()'s definition}
>>>>> Petr Savicky <savi...@cs.cas.cz> >>>>> on Fri, 18 Dec 2009 00:14:49 +0100 writes: > On Thu, Dec 17, 2009 at 03:10:49PM +0100, Martin Maechler wrote: > [...] MM> This, of course, is an even more compelling reason to implement MM> the change of return log(abs(choose(.,.)), MM> and at the moment, I'd even plan to "backport" that to R "2.10.1 MM> patched", as the current behavior is clearly bugous. >> >> This has now happened; >> svn revisions 50775 and 50776 {R-devel and R-patched}. > Thank you. > When preparing a test, whether lchoose(n, k) \approx log(abs(choose(n, k))), it > appeared that there is a minor problem also in choose(n, k), when n is very close > to k, but not equal. > options(digits=15) > n <- 5 + (-15:15)*1e-8 > cbind(n, choose(n, 5)) > # n > # [1,] 4.99999985 0.999999657500042 > # [2,] 4.99999986 0.999999680333370 > # [3,] 4.99999987 0.999999703166698 > # [4,] 4.99999988 0.999999726000027 > # [5,] 4.99999989 0.999999748833356 > # [6,] 4.99999990 0.999999771666685 > # [7,] 4.99999991 0.000000000000000 > # [8,] 4.99999992 0.000000000000000 > # [9,] 4.99999993 0.000000000000000 > # [10,] 4.99999994 0.000000000000000 > # [11,] 4.99999995 0.000000000000000 > # [12,] 4.99999996 0.000000000000000 > # [13,] 4.99999997 0.000000000000000 > # [14,] 4.99999998 0.000000000000000 > # [15,] 4.99999999 0.000000000000000 > # [16,] 5.00000000 1.000000000000000 > # [17,] 5.00000001 5.000000000000000 > # [18,] 5.00000002 5.000000000000000 > # [19,] 5.00000003 5.000000000000000 > # [20,] 5.00000004 5.000000000000000 > # [21,] 5.00000005 5.000000000000000 > # [22,] 5.00000006 5.000000000000000 > # [23,] 5.00000007 5.000000000000000 > # [24,] 5.00000008 5.000000000000000 > # [25,] 5.00000009 5.000000000000000 > # [26,] 5.00000010 1.000000228333353 > # [27,] 5.00000011 1.000000251166690 > # [28,] 5.00000012 1.000000274000027 > # [29,] 5.00000013 1.000000296833365 > # [30,] 5.00000014 1.000000319666704 > # [31,] 5.00000015 1.000000342500042 > All the values in the second column should be close to 1, but some are 0 and > some are 5. The reason seems to be in the line 112 of src/nmath/choose.c, > which is > if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */ > If n is not exactly an integer, then k becomes also non-integer. > Since the code relies on k being an exact integer, we get an error > as follows. > If n = k - eps for a small positive eps, then the next line 113 > if (k < 0) return 0.; > determines the output since k is now - eps. > If n = k + eps for a small positive eps, then the line 116 > r = n; > determines the output, since now k = eps, so the condition j <= k is not > satisfied already at the beginning of the next cycle. > I suggest two solutions, a more conservative one and a less conservative one. > A more conservative solution is to replace the line 112 by > if(n-k < k && n >= 0 && R_IS_INT(n)) k = floor(n - k + 0.5); /* <- Symmetry */ > This yields the following in the same test as above. > # n > # [1,] 4.99999985 0.999999657500042 > # [2,] 4.99999986 0.999999680333370 > # [3,] 4.99999987 0.999999703166698 > # [4,] 4.99999988 0.999999726000027 > # [5,] 4.99999989 0.999999748833356 > # [6,] 4.99999990 0.999999771666685 > # [7,] 4.99999991 1.000000000000000 > # [8,] 4.99999992 1.000000000000000 > # [9,] 4.99999993 1.000000000000000 > # [10,] 4.99999994 1.000000000000000 > # [11,] 4.99999995 1.000000000000000 > # [12,] 4.99999996 1.000000000000000 > # [13,] 4.99999997 1.000000000000000 > # [14,] 4.99999998 1.000000000000000 > # [15,] 4.99999999 1.000000000000000 > # [16,] 5.00000000 1.000000000000000 > # [17,] 5.00000001 1.000000000000000 > # [18,] 5.00000002 1.000000000000000 > # [19,] 5.00000003 1.000000000000000 > # [20,] 5.00000004 1.000000000000000 > # [21,] 5.00000005 1.000000000000000 > # [22,] 5.00000006 1.000000000000000 > # [23,] 5.00000007 1.000000000000000 > # [24,] 5.00000008 1.000000000000000 > # [25,] 5.00000009 1.000000000000000 > # [26,] 5.00000010 1.000000228333353 > # [27,] 5.00000011 1.000000251166690 > # [28,] 5.00000012 1.000000274000027 > # [29,] 5.00000013 1.000000296833365 > # [30,] 5.00000014 1.000000319666704 > # [31,] 5.00000015 1.000000342500042 > So, the extreme values 0 and 5 do not occur, but there is an interval > of constant values and on both ends of this interval, there is a jump > much larger than machine accuracy. > A less conservative solution (a patch is in an attachment) replaces > lines 110 - 121 by > if (k < k_small_max) { > int j; > if(n-k < k && n >= 0 && (n == floor(n))) k = n-k; /* <- Symmetry */ > if (k < 0) return 0.; > if (k == 0) return 1.; > /* else: k >= 1 */ > r = n; > for(j = 2; j <= k; j++) { > r *= (n-j+1); > r /= j; > } > return r; > } > Here, the symetry is used only for exact integers. The symetry is > valid only for integers, since choose(n, k) is a polynomial of > degree k. So, for different values of k, we get a different functions > on any interval of positive length. > The division at the line > r /= j; > always produces an exact integer, since during the whole cycle, r is always > either an integer binomial coefficient or its integer multiple. So, the > division has an integer result and there is no rounding error unless the > intermediate result does not fit into 53 bit precison. > Using the above code, we get > options(digits=15) > n <- 5 + (-15:15)*1e-8 > cbind(n, choose(n, 5)) > # n > # [1,] 4.99999985 0.999999657500042 > # [2,] 4.99999986 0.999999680333370 > # [3,] 4.99999987 0.999999703166698 > # [4,] 4.99999988 0.999999726000027 > # [5,] 4.99999989 0.999999748833356 > # [6,] 4.99999990 0.999999771666685 > # [7,] 4.99999991 0.999999794500014 > # [8,] 4.99999992 0.999999817333345 > # [9,] 4.99999993 0.999999840166677 > # [10,] 4.99999994 0.999999863000008 > # [11,] 4.99999995 0.999999885833339 > # [12,] 4.99999996 0.999999908666670 > # [13,] 4.99999997 0.999999931500002 > # [14,] 4.99999998 0.999999954333335 > # [15,] 4.99999999 0.999999977166667 > # [16,] 5.00000000 1.000000000000000 > # [17,] 5.00000001 1.000000022833334 > # [18,] 5.00000002 1.000000045666667 > # [19,] 5.00000003 1.000000068500001 > # [20,] 5.00000004 1.000000091333336 > # [21,] 5.00000005 1.000000114166671 > # [22,] 5.00000006 1.000000137000006 > # [23,] 5.00000007 1.000000159833342 > # [24,] 5.00000008 1.000000182666680 > # [25,] 5.00000009 1.000000205500016 > # [26,] 5.00000010 1.000000228333353 > # [27,] 5.00000011 1.000000251166690 > # [28,] 5.00000012 1.000000274000027 > # [29,] 5.00000013 1.000000296833365 > # [30,] 5.00000014 1.000000319666704 > # [31,] 5.00000015 1.000000342500042 > Within the chosen accuracy 1e-8, there is no visible jump. > With 1e-15, we get > options(digits=15) > n <- 5 + (-15:15)*1e-15 > cbind(n, choose(n, 5)) > # n > # [1,] 4.99999999999998 0.999999999999966 > # [2,] 4.99999999999999 0.999999999999967 > # [3,] 4.99999999999999 0.999999999999970 > # [4,] 4.99999999999999 0.999999999999972 > # [5,] 4.99999999999999 0.999999999999976 > # [6,] 4.99999999999999 0.999999999999978 > # [7,] 4.99999999999999 0.999999999999980 > # [8,] 4.99999999999999 0.999999999999982 > # [9,] 4.99999999999999 0.999999999999984 > # [10,] 4.99999999999999 0.999999999999986 > # [11,] 4.99999999999999 0.999999999999988 > # [12,] 5.00000000000000 0.999999999999990 > # [13,] 5.00000000000000 0.999999999999994 > # [14,] 5.00000000000000 0.999999999999996 > # [15,] 5.00000000000000 0.999999999999998 > # [16,] 5.00000000000000 1.000000000000000 > # [17,] 5.00000000000000 1.000000000000002 > # [18,] 5.00000000000000 1.000000000000004 > # [19,] 5.00000000000000 1.000000000000006 > # [20,] 5.00000000000000 1.000000000000010 > # [21,] 5.00000000000001 1.000000000000012 > # [22,] 5.00000000000001 1.000000000000014 > # [23,] 5.00000000000001 1.000000000000016 > # [24,] 5.00000000000001 1.000000000000018 > # [25,] 5.00000000000001 1.000000000000020 > # [26,] 5.00000000000001 1.000000000000022 > # [27,] 5.00000000000001 1.000000000000024 > # [28,] 5.00000000000001 1.000000000000028 > # [29,] 5.00000000000001 1.000000000000031 > # [30,] 5.00000000000001 1.000000000000032 > # [31,] 5.00000000000002 1.000000000000035 > So, the jump in the exactly integer value n[16] == 5 is comparable > to machine accuracy. > I would be pleased to provide more tests, if some of the above solutions > or their modifications can be accepted. > Petr Savicky. [.......................] Thanks a lot, Petr, for your explorations. I agree that at least something like your conservative change should happen. Personally, I'd agree with you and prefer the "progressive" (non-conservative) change, basically getting rid of all R_IS_INT(n) tests, which seem kludgey. OTOH, the R core members who did start using R_IS_INT(n) must have had good reasons with another set examples. Have you tried 'make check-devel' (or 'check-all') also with the progressive change? I'll hope to get back to this on Monday. Martin ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel