Hello,
A solution is to use package cubature, both unscaled and scaled versions
return the correct result, 3.575294.
But the performance penalty is BIG. This is because the number of
evaluations is much bigger.
library(cubature)
cubintegrate(f, -Inf, 0, method = "hcubature")
#$integral
#[1] 3.575294
#
#$error
#[1] 1.494242e-07
#
#$neval
#[1] 375
#
#$returnCode
#[1] 0
cubintegrate(f, -Inf, 0, method = "hcubature", numstab = sc)
#$integral
#[1] 3.575294
#
#$error
#[1] 1.064195e-05
#
#$neval
#[1] 255
#
#$returnCode
#[1] 0
library(microbenchmark)
microbenchmark(
base1 = integrate(f, -Inf, 0),
base2 = integrate(f, -Inf, 0, numstab = sc),
cuba1 = cubintegrate(f, -Inf, 0, method = "hcubature"),
cuba2 = cubintegrate(f, -Inf, 0, method = "hcubature", numstab = sc)
)
Hope this helps,
Rui Barradas
Às 15:52 de 12/04/19, Andreï V. Kostyrka escreveu:
Dear all,
This is the first time I am posting to the r-devel list. On
StackOverflow, they suggested that the strange behaviour of integrate()
was more bug-like. I am providing a short version of the question (full
one with plots: https://stackoverflow.com/q/55639401).
Suppose one wants integrate a function that is just a product of two
density functions (like gamma). The support of the random variable is
(-Inf, 0]. The scale parameter of the distribution is quite small
(around 0.01), so often, the standard integration routine would fail to
integrate a function that is non-zero on a very small section of the
negative line (like [-0.02, -0.01], where it takes huge values, and
almost 0 everywhere else). R’s integrate would often return the machine
epsilon as a result. So I stretch the function around the zero by an
inverse of the scale parameter, compute the integral, and then divide it
by the scale. Sometimes, this re-scaling also failed, so I did both if
the first result was very small.
Today when integration of the rescaled function suddenly yielded a value
of 1.5 instead of 3.5 (not even zero). The MWE is below.
cons <- -0.020374721416129591
sc <- 0.00271245601724757383
sh <- 5.704
f <- function(x, numstab = 1) dgamma(cons - x * numstab, shape = sh,
scale = sc) * dgamma(-x * numstab, shape = sh, scale = sc) * numstab
curve(f, -0.06, 0, n = 501, main = "Unscaled f", bty = "n")
curve(f(x, sc), -0.06 / sc, 0, n = 501, main = "Scaled f", bty = "n")
sum(f(seq(-0.08, 0, 1e-6))) * 1e-6 # Checking by summation: 3.575294
sum(f(seq(-30, 0, 1e-4), numstab = sc)) * 1e-4 # True value, 3.575294
str(integrate(f, -Inf, 0)) # Gives 3.575294
# $ value : num 3.58
# $ abs.error : num 1.71e-06
# $ subdivisions: int 10
str(integrate(f, -Inf, 0, numstab = sc))
# $ value : num 1.5 # What?!
# $ abs.error : num 0.000145 # What?!
# $ subdivisions: int 2
It stop at just two subdivisions! The problem is, I cannot try various
stabilising multipliers for the function because I have to compute this
integral thousands of times for thousands of parameter values on
thousands of sample windows for hundreds on models, so even in the
super-computer cluster, this takes weeks. Besides that, reducing the
rel.tol just to 1e-5 or 1e-6, helped a bit, but I am not sure whether
this guarantees success (and reducing it to 1e-7 slowed down the
computations in some cases). And I have looked at the Fortran code of
the quadrature just to see the integration rule, and was wondering.
How can I make sure that the integration routine will not produce such
wrong results for such a function, and the integration will still be fast?
Yours sincerely,
Andreï V. Kostyrka
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel