[Rd] [r-devel] integrate over an infinite region produces wrong results depending on scaling

2019-04-14 Thread Andreï V . Kostyrka

Dear all,

This is the first time I am posting to the r-devel list. On 
StackOverflow, they suggested that the strange behaviour of integrate() 
was more bug-like. I am providing a short version of the question (full 
one with plots: https://stackoverflow.com/q/55639401).


Suppose one wants integrate a function that is just a product of two 
density functions (like gamma). The support of the random variable is 
(-Inf, 0]. The scale parameter of the distribution is quite small 
(around 0.01), so often, the standard integration routine would fail to 
integrate a function that is non-zero on a very small section of the 
negative line (like [-0.02, -0.01], where it takes huge values, and 
almost 0 everywhere else). R’s integrate would often return the machine 
epsilon as a result. So I stretch the function around the zero by an 
inverse of the scale parameter, compute the integral, and then divide it 
by the scale. Sometimes, this re-scaling also failed, so I did both if 
the first result was very small.


Today when integration of the rescaled function suddenly yielded a value 
of 1.5 instead of 3.5 (not even zero). The MWE is below.


cons <- -0.020374721416129591
sc <- 0.00271245601724757383
sh <- 5.704
f <- function(x, numstab = 1) dgamma(cons - x * numstab, shape = sh, 
scale = sc) * dgamma(-x * numstab, shape = sh, scale = sc) * numstab


curve(f, -0.06, 0, n = 501, main = "Unscaled f", bty = "n")
curve(f(x, sc), -0.06 / sc, 0, n = 501, main = "Scaled f", bty = "n")

sum(f(seq(-0.08, 0, 1e-6))) * 1e-6 #  Checking by summation: 3.575294
sum(f(seq(-30, 0, 1e-4), numstab = sc)) * 1e-4 # True value, 3.575294
str(integrate(f, -Inf, 0)) # Gives 3.575294
# $ value   : num 3.58
# $ abs.error   : num 1.71e-06
# $ subdivisions: int 10
str(integrate(f, -Inf, 0, numstab = sc))
# $ value   : num 1.5 # What?!
# $ abs.error   : num 0.000145 # What?!
# $ subdivisions: int 2

It stop at just two subdivisions! The problem is, I cannot try various 
stabilising multipliers for the function because I have to compute this 
integral thousands of times for thousands of parameter values on 
thousands of sample windows for hundreds on models, so even in the 
super-computer cluster, this takes weeks. Besides that, reducing the 
rel.tol just to 1e-5 or 1e-6, helped a bit, but I am not sure whether 
this guarantees success (and reducing it to 1e-7 slowed down the 
computations in some cases). And I have looked at the Fortran code of 
the quadrature just to see the integration rule, and was wondering.


How can I make sure that the integration routine will not produce such 
wrong results for such a function, and the integration will still be fast?


Yours sincerely,
Andreï V. Kostyrka

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [r-devel] integrate over an infinite region produces wrong results depending on scaling

2019-04-14 Thread William Dunlap via R-devel
integrate(f, xmin, xmax) will have problems when f(x) is 0 over large parts
of (xmin,xmax).  It doesn't have any clues to where the non-zero regions
are.  It computes f(x) at 21 points at each step and if all of those are
zero (or some other constant?) for a few steps, it calls it a day.  If you
can narrow down the integration interval to the interesting part of the
function's domain you will get better results.

By the way, here is a way to see where integrate(f) evaluates f()  (the
keep.xy=TRUE argument doesn't seem to do anything).

> debugIntegrate <- function(f)
{
n_calls <- 0
x_args <- list()
other_args <- list()
value <- list()
function(x, ...) {
n_calls <<- n_calls + 1
x_args[[n_calls]] <<- x
other_args[[n_calls]] <<- list(...)
v <- f(x, ...)
value[[n_calls]] <<- v
v
}
}

> str(integrate(DF <- debugIntegrate(f), -Inf, 0, numstab = sc))
List of 5
 $ value   : num 1.5
 $ abs.error   : num 0.000145
 $ subdivisions: int 2
 $ message : chr "OK"
 $ call: language integrate(f = DF <- debugIntegrate(f), lower =
-Inf, upper = 0, numstab = sc)
 - attr(*, "class")= chr "integrate"
> curve(f(x, sc), min(unlist(environment(DF)$x_args)), 0, n = 501, main =
"Scaled f", bty = "n")
> with(environment(DF), points(unlist(x_args), unlist(value)))

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Sun, Apr 14, 2019 at 5:13 AM Andreï V. Kostyrka 
wrote:

> Dear all,
>
> This is the first time I am posting to the r-devel list. On
> StackOverflow, they suggested that the strange behaviour of integrate()
> was more bug-like. I am providing a short version of the question (full
> one with plots: https://stackoverflow.com/q/55639401).
>
> Suppose one wants integrate a function that is just a product of two
> density functions (like gamma). The support of the random variable is
> (-Inf, 0]. The scale parameter of the distribution is quite small
> (around 0.01), so often, the standard integration routine would fail to
> integrate a function that is non-zero on a very small section of the
> negative line (like [-0.02, -0.01], where it takes huge values, and
> almost 0 everywhere else). R’s integrate would often return the machine
> epsilon as a result. So I stretch the function around the zero by an
> inverse of the scale parameter, compute the integral, and then divide it
> by the scale. Sometimes, this re-scaling also failed, so I did both if
> the first result was very small.
>
> Today when integration of the rescaled function suddenly yielded a value
> of 1.5 instead of 3.5 (not even zero). The MWE is below.
>
> cons <- -0.020374721416129591
> sc <- 0.00271245601724757383
> sh <- 5.704
> f <- function(x, numstab = 1) dgamma(cons - x * numstab, shape = sh,
> scale = sc) * dgamma(-x * numstab, shape = sh, scale = sc) * numstab
>
> curve(f, -0.06, 0, n = 501, main = "Unscaled f", bty = "n")
> curve(f(x, sc), -0.06 / sc, 0, n = 501, main = "Scaled f", bty = "n")
>
> sum(f(seq(-0.08, 0, 1e-6))) * 1e-6 #  Checking by summation: 3.575294
> sum(f(seq(-30, 0, 1e-4), numstab = sc)) * 1e-4 # True value, 3.575294
> str(integrate(f, -Inf, 0)) # Gives 3.575294
> # $ value   : num 3.58
> # $ abs.error   : num 1.71e-06
> # $ subdivisions: int 10
> str(integrate(f, -Inf, 0, numstab = sc))
> # $ value   : num 1.5 # What?!
> # $ abs.error   : num 0.000145 # What?!
> # $ subdivisions: int 2
>
> It stop at just two subdivisions! The problem is, I cannot try various
> stabilising multipliers for the function because I have to compute this
> integral thousands of times for thousands of parameter values on
> thousands of sample windows for hundreds on models, so even in the
> super-computer cluster, this takes weeks. Besides that, reducing the
> rel.tol just to 1e-5 or 1e-6, helped a bit, but I am not sure whether
> this guarantees success (and reducing it to 1e-7 slowed down the
> computations in some cases). And I have looked at the Fortran code of
> the quadrature just to see the integration rule, and was wondering.
>
> How can I make sure that the integration routine will not produce such
> wrong results for such a function, and the integration will still be fast?
>
> Yours sincerely,
> Andreï V. Kostyrka
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] stopifnot

2019-04-14 Thread Suharto Anggono Suharto Anggono via R-devel
In current definition of function 'stopifnot' in stop.R in R 3.6.0 beta 
(https://svn.r-project.org/R/branches/R-3-6-branch/src/library/base/R/stop.R) 
or R devel (https://svn.r-project.org/R/trunk/src/library/base/R/stop.R), if 
'exprs' is specified, cl[[1]] is quote(stopifnot) . To be more robust, 
quote(base::stopifnot) may be used instead.


Also, in current definition of function 'stopifnot' in R 3.6.0 beta or R devel, 
for 'cl' if 'exprs' is specified, there a case with comment "the *name* of an 
expression". The intent is allowing
stopifnot(exprs = ee) ,
where variable 'ee' holds an expression object, to work on the expression 
object.

It is not quite right to use eval(exprs) . It fails when 'stopifnot' is called 
inside a function, like
f <- function(ee) stopifnot(exprs = ee)
f(expression())

But, how about local=FALSE case? Should the following work?
f <- function(ee) stopifnot(exprs = ee, local = FALSE)
f(expression())

But, why bother making it work, while it is undocumented that 'exprs' argument 
in 'stopifnot' can be an expression? Well, yes, expectation may be set from the 
name "exprs" itself or from argument 'exprs' in function 'source' or 
'withAutoprint'. Function 'withAutoprint' may be the closest match.

Function 'withAutoprint' has 'evaluated' argument that controls whether work is 
on value of  'exprs' or on 'exprs' as given. I like the approach.


If 'E1' is an expression object,
as.call(c(quote(stopifnot), E1))
also works, without converting 'E1' to list.


I suggest to arrange "details" section in stopifnot.Rd as follows:
This function is intended ...
Since R version 3.5.0, stopifnot(exprs = { ... }) ...
stopifnot(A, B) ... is conceptually equivalent to ...
Since R version 3.5.0, expressions are evaluated sequentially ...
Since R version 3.6.0, stopifnot no longer handles potential errors or warnings 
...  ---not including sys.call()
Since R version 3.4.0, ... all.equal ...
sys.call()

Use of sys.call() in 'stopifnot' actually happens since R 3.5.0, as the call 
included in error message produced by 'stopifnot'. In R 3.5.x, it is 
sys.call(-1) , that can be NULL . In current R 3.6.0 beta, it is 
sys.call(sys.parent(1L)) , only if sys.parent(1L) is not 0. The two may differ 
only for 'stopifnot' that is called via 'eval' or the like.

I think it is good if the documentation also includes an example of use of 
'stopifnot' inside a function, where error message from 'stopifnot' includes 
call since R 3.5.0. Such an example is in 
https://stat.ethz.ch/pipermail/r-devel/2017-May/074303.html .


On Mon, 1/4/19, Martin Maechler  wrote:

 Subject: Re: [Rd] stopifnot

 Cc: r-devel@r-project.org
 Date: Monday, 1 April, 2019, 8:12 PM

> Suharto Anggono Suharto Anggono via R-devel 
>    on Sun, 31 Mar 2019 15:26:13 + writes:

[.]
[ "eval() inside for()" not giving call in error message .]
[.]

    > "Details" section of 'stopifnot' documentation in current R 3.6.0 alpha
    > 
(https://svn.r-project.org/R/branches/R-3-6-branch/src/library/base/man/stopifnot.Rd)
    > has this.

    >   Since \R version 3.6.0, \code{stopifnot()} no longer handles potential
    >   errors or warnings (by \code{\link{tryCatch}()} etc) for each single
    >   expression but rather aims at using the correct
    >   \code{\link{sys.call}()} to get the most meaningful error message in
    >   case of an error.  This provides considerably less overhead.

    > I think part of the first sentence starting from "but rather" should be 
removed because it is not true.

You are right that it is not accurate... I'll modify it,
including keeping the  "considerably less overhead"
which had been one important reason for changing from 3.5.x to
the current version.

    > The next paragraph:

    >   Since \R version 3.5.0, expressions \emph{are} evaluated sequentially,
    >   and hence evaluation stops as soon as there is a \dQuote{non-TRUE}, as
    >   indicated by the above conceptual equivalence statement.
    >   Further, when such an expression signals an error or
    >   \code{\link{warning}}, its \code{\link{conditionCall}()} no longer
    >   contains the full \code{stopifnot} call, but just the erroneous
    >   expression.

    > As I said earlier 
(https://stat.ethz.ch/pipermail/r-devel/2019-February/077386.html), the last 
sentence above is not entirely true. 

You are right to some degree:  That really was true for R 3.5.x,
but is no longer entirely accurate.

It is still true currently interestingly thanks to the "eval() in for()"
behavior that the error/warning message is most of the time only
about the relevant part and not mentioning the full stopifnot(..) call.


    > It may say something like:
    > Further, when such an expression signals an error, stopifnot() in R 3.5.x 
makes its conditionCall() the erroneous expression, but no longer since R 3.6.0.


    > Is it OK that, for
    > do.call(stopifnot, list(exprs = 

Re: [Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()

2019-04-14 Thread Tomas Kalibera

On 4/13/19 12:05 PM, Iñaki Ucar wrote:

On Sat, 13 Apr 2019 at 03:51, Kevin Ushey  wrote:

I think it's worth saying that mclapply() works as documented

Mostly, yes. But it says nothing about fork's copy-on-write and memory
overcommitment, and that this means that it may work nicely or fail
spectacularly depending on whether, e.g., you operate on a long
vector.


R cannot possibly replicate documentation of the underlying operating 
systems. It clearly says that fork() is used and readers who may not 
know what fork() is need to learn it from external sources. 
Copy-on-write is an elementary property of fork().


Reimplementing mclapply to use PSOCK does not make sense -- if someone 
wants to write code that can be used both with PSOCK and FORK, there is 
the cluster API in parallel for that.


Tomas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel