On Thu, Dec 17, 2009 at 03:10:49PM +0100, Martin Maechler wrote: [...] > MM> This, of course, is an even more compelling reason to implement > MM> the change of return log(abs(choose(.,.)), > MM> and at the moment, I'd even plan to "backport" that to R "2.10.1 > MM> patched", as the current behavior is clearly bugous. > > This has now happened; > svn revisions 50775 and 50776 {R-devel and R-patched}.
Thank you. When preparing a test, whether lchoose(n, k) \approx log(abs(choose(n, k))), it appeared that there is a minor problem also in choose(n, k), when n is very close to k, but not equal. options(digits=15) n <- 5 + (-15:15)*1e-8 cbind(n, choose(n, 5)) # n # [1,] 4.99999985 0.999999657500042 # [2,] 4.99999986 0.999999680333370 # [3,] 4.99999987 0.999999703166698 # [4,] 4.99999988 0.999999726000027 # [5,] 4.99999989 0.999999748833356 # [6,] 4.99999990 0.999999771666685 # [7,] 4.99999991 0.000000000000000 # [8,] 4.99999992 0.000000000000000 # [9,] 4.99999993 0.000000000000000 # [10,] 4.99999994 0.000000000000000 # [11,] 4.99999995 0.000000000000000 # [12,] 4.99999996 0.000000000000000 # [13,] 4.99999997 0.000000000000000 # [14,] 4.99999998 0.000000000000000 # [15,] 4.99999999 0.000000000000000 # [16,] 5.00000000 1.000000000000000 # [17,] 5.00000001 5.000000000000000 # [18,] 5.00000002 5.000000000000000 # [19,] 5.00000003 5.000000000000000 # [20,] 5.00000004 5.000000000000000 # [21,] 5.00000005 5.000000000000000 # [22,] 5.00000006 5.000000000000000 # [23,] 5.00000007 5.000000000000000 # [24,] 5.00000008 5.000000000000000 # [25,] 5.00000009 5.000000000000000 # [26,] 5.00000010 1.000000228333353 # [27,] 5.00000011 1.000000251166690 # [28,] 5.00000012 1.000000274000027 # [29,] 5.00000013 1.000000296833365 # [30,] 5.00000014 1.000000319666704 # [31,] 5.00000015 1.000000342500042 All the values in the second column should be close to 1, but some are 0 and some are 5. The reason seems to be in the line 112 of src/nmath/choose.c, which is if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */ If n is not exactly an integer, then k becomes also non-integer. Since the code relies on k being an exact integer, we get an error as follows. If n = k - eps for a small positive eps, then the next line 113 if (k < 0) return 0.; determines the output since k is now - eps. If n = k + eps for a small positive eps, then the line 116 r = n; determines the output, since now k = eps, so the condition j <= k is not satisfied already at the beginning of the next cycle. I suggest two solutions, a more conservative one and a less conservative one. A more conservative solution is to replace the line 112 by if(n-k < k && n >= 0 && R_IS_INT(n)) k = floor(n - k + 0.5); /* <- Symmetry */ This yields the following in the same test as above. # n # [1,] 4.99999985 0.999999657500042 # [2,] 4.99999986 0.999999680333370 # [3,] 4.99999987 0.999999703166698 # [4,] 4.99999988 0.999999726000027 # [5,] 4.99999989 0.999999748833356 # [6,] 4.99999990 0.999999771666685 # [7,] 4.99999991 1.000000000000000 # [8,] 4.99999992 1.000000000000000 # [9,] 4.99999993 1.000000000000000 # [10,] 4.99999994 1.000000000000000 # [11,] 4.99999995 1.000000000000000 # [12,] 4.99999996 1.000000000000000 # [13,] 4.99999997 1.000000000000000 # [14,] 4.99999998 1.000000000000000 # [15,] 4.99999999 1.000000000000000 # [16,] 5.00000000 1.000000000000000 # [17,] 5.00000001 1.000000000000000 # [18,] 5.00000002 1.000000000000000 # [19,] 5.00000003 1.000000000000000 # [20,] 5.00000004 1.000000000000000 # [21,] 5.00000005 1.000000000000000 # [22,] 5.00000006 1.000000000000000 # [23,] 5.00000007 1.000000000000000 # [24,] 5.00000008 1.000000000000000 # [25,] 5.00000009 1.000000000000000 # [26,] 5.00000010 1.000000228333353 # [27,] 5.00000011 1.000000251166690 # [28,] 5.00000012 1.000000274000027 # [29,] 5.00000013 1.000000296833365 # [30,] 5.00000014 1.000000319666704 # [31,] 5.00000015 1.000000342500042 So, the extreme values 0 and 5 do not occur, but there is an interval of constant values and on both ends of this interval, there is a jump much larger than machine accuracy. A less conservative solution (a patch is in an attachment) replaces lines 110 - 121 by if (k < k_small_max) { int j; if(n-k < k && n >= 0 && (n == floor(n))) k = n-k; /* <- Symmetry */ if (k < 0) return 0.; if (k == 0) return 1.; /* else: k >= 1 */ r = n; for(j = 2; j <= k; j++) { r *= (n-j+1); r /= j; } return r; } Here, the symetry is used only for exact integers. The symetry is valid only for integers, since choose(n, k) is a polynomial of degree k. So, for different values of k, we get a different functions on any interval of positive length. The division at the line r /= j; always produces an exact integer, since during the whole cycle, r is always either an integer binomial coefficient or its integer multiple. So, the division has an integer result and there is no rounding error unless the intermediate result does not fit into 53 bit precison. Using the above code, we get options(digits=15) n <- 5 + (-15:15)*1e-8 cbind(n, choose(n, 5)) # n # [1,] 4.99999985 0.999999657500042 # [2,] 4.99999986 0.999999680333370 # [3,] 4.99999987 0.999999703166698 # [4,] 4.99999988 0.999999726000027 # [5,] 4.99999989 0.999999748833356 # [6,] 4.99999990 0.999999771666685 # [7,] 4.99999991 0.999999794500014 # [8,] 4.99999992 0.999999817333345 # [9,] 4.99999993 0.999999840166677 # [10,] 4.99999994 0.999999863000008 # [11,] 4.99999995 0.999999885833339 # [12,] 4.99999996 0.999999908666670 # [13,] 4.99999997 0.999999931500002 # [14,] 4.99999998 0.999999954333335 # [15,] 4.99999999 0.999999977166667 # [16,] 5.00000000 1.000000000000000 # [17,] 5.00000001 1.000000022833334 # [18,] 5.00000002 1.000000045666667 # [19,] 5.00000003 1.000000068500001 # [20,] 5.00000004 1.000000091333336 # [21,] 5.00000005 1.000000114166671 # [22,] 5.00000006 1.000000137000006 # [23,] 5.00000007 1.000000159833342 # [24,] 5.00000008 1.000000182666680 # [25,] 5.00000009 1.000000205500016 # [26,] 5.00000010 1.000000228333353 # [27,] 5.00000011 1.000000251166690 # [28,] 5.00000012 1.000000274000027 # [29,] 5.00000013 1.000000296833365 # [30,] 5.00000014 1.000000319666704 # [31,] 5.00000015 1.000000342500042 Within the chosen accuracy 1e-8, there is no visible jump. With 1e-15, we get options(digits=15) n <- 5 + (-15:15)*1e-15 cbind(n, choose(n, 5)) # n # [1,] 4.99999999999998 0.999999999999966 # [2,] 4.99999999999999 0.999999999999967 # [3,] 4.99999999999999 0.999999999999970 # [4,] 4.99999999999999 0.999999999999972 # [5,] 4.99999999999999 0.999999999999976 # [6,] 4.99999999999999 0.999999999999978 # [7,] 4.99999999999999 0.999999999999980 # [8,] 4.99999999999999 0.999999999999982 # [9,] 4.99999999999999 0.999999999999984 # [10,] 4.99999999999999 0.999999999999986 # [11,] 4.99999999999999 0.999999999999988 # [12,] 5.00000000000000 0.999999999999990 # [13,] 5.00000000000000 0.999999999999994 # [14,] 5.00000000000000 0.999999999999996 # [15,] 5.00000000000000 0.999999999999998 # [16,] 5.00000000000000 1.000000000000000 # [17,] 5.00000000000000 1.000000000000002 # [18,] 5.00000000000000 1.000000000000004 # [19,] 5.00000000000000 1.000000000000006 # [20,] 5.00000000000000 1.000000000000010 # [21,] 5.00000000000001 1.000000000000012 # [22,] 5.00000000000001 1.000000000000014 # [23,] 5.00000000000001 1.000000000000016 # [24,] 5.00000000000001 1.000000000000018 # [25,] 5.00000000000001 1.000000000000020 # [26,] 5.00000000000001 1.000000000000022 # [27,] 5.00000000000001 1.000000000000024 # [28,] 5.00000000000001 1.000000000000028 # [29,] 5.00000000000001 1.000000000000031 # [30,] 5.00000000000001 1.000000000000032 # [31,] 5.00000000000002 1.000000000000035 So, the jump in the exactly integer value n[16] == 5 is comparable to machine accuracy. I would be pleased to provide more tests, if some of the above solutions or their modifications can be accepted. Petr Savicky.
--- R-devel_2009-12-16/src/nmath/choose.c 2008-03-17 17:52:35.000000000 +0100 +++ R-devel_less_conservative/src/nmath/choose.c 2009-12-17 22:43:54.000000000 +0100 @@ -107,19 +107,20 @@ #endif if (fabs(k - k0) > 1e-7) MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k); if (k < k_small_max) { int j; - if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */ + if(n-k < k && n >= 0 && (n == floor(n))) k = n-k; /* <- Symmetry */ if (k < 0) return 0.; if (k == 0) return 1.; /* else: k >= 1 */ r = n; - for(j = 2; j <= k; j++) - r *= (n-j+1)/j; - return R_IS_INT(n) ? floor(r + 0.5) : r; - /* might have got rounding errors */ + for(j = 2; j <= k; j++) { + r *= (n-j+1); + r /= j; + } + return r; } /* else: k >= k_small_max */ if (n < 0) { r = choose(-n+ k-1, k); if (ODD(k)) r = -r;
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel