[Rd] Typos in file.path documentation.
Hello, R 4.0.2 on Ubuntu 20.04, sessionInfo() below. I believe there are two typos in ?file.path, section Value, 2nd paragraph. 1. There is a close parenthesis missing after Encoding, as it is reading is a bit confusing, I had to backtrack and repeat. 2. I'm not a native language speaker but before a consonant it's 'a', not 'an', right? an component should be a component Current: An element of the result will be marked (see Encoding as UTF-8 if run in a UTF-8 locale (when marked inputs are converted to UTF-8) or if an component of the result is marked as UTF-8, or as Latin-1 in a non-Latin-1 locale. Should be: An element of the result will be marked (see Encoding) as UTF-8 if run in a UTF-8 locale (when marked inputs are converted to UTF-8) or if a component of the result is marked as UTF-8, or as Latin-1 in a non-Latin-1 locale. sessionInfo() R version 4.0.2 (2020-06-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.1 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 locale: [1] LC_CTYPE=pt_PT.UTF-8 LC_NUMERIC=C [3] LC_TIME=pt_PT.UTF-8LC_COLLATE=pt_PT.UTF-8 [5] LC_MONETARY=pt_PT.UTF-8LC_MESSAGES=pt_PT.UTF-8 [7] LC_PAPER=pt_PT.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.2 Hope this helps, Rui Barradas __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] qnbinom with small size is slow
Thanks Ben for verifying the issue. It is always reassuring to hear when others can reproduce the problem. I wrote a small patch that fixes the issue (https://github.com/r-devel/r-svn/pull/11): diff --git a/src/nmath/qnbinom.c b/src/nmath/qnbinom.c index b313ce56b2..d2e8d98759 100644 --- a/src/nmath/qnbinom.c +++ b/src/nmath/qnbinom.c @@ -104,6 +104,7 @@ double qnbinom(double p, double size, double prob, int lower_tail, int log_p) /* y := approx.value (Cornish-Fisher expansion) : */ z = qnorm(p, 0., 1., /*lower_tail*/TRUE, /*log_p*/FALSE); y = R_forceint(mu + sigma * (z + gamma * (z*z - 1) / 6)); +y = fmax2(0.0, y); z = pnbinom(y, size, prob, /*lower_tail*/TRUE, /*log_p*/FALSE); I used the https://github.com/r-devel/r-svn repo and its continuous integration tools to check that it doesn't break any existing tests: https://github.com/r-devel/r-svn/actions/runs/201327042 I have also requested a Bugzilla-account, but haven't heard anything back yet. Best, Constantin Am Fr., 7. Aug. 2020 um 21:41 Uhr schrieb Ben Bolker : > > I can reproduce this on > > R Under development (unstable) (2020-07-24 r78910) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Pop!_OS 18.04 LTS > >In my opinion this is worth reporting, but discussing it here first > was a good idea. Many more people read this list than watch the bug > tracker, so it will get more attention here; once the excitement has > died down here (which might be almost immediately!), if no-one has > already volunteered to post it to the bug tracker, request an account > (as specified at https://www.r-project.org/bugs.html ) > >Thanks! > > Ben Bolker > > > For what it's worth it doesn't seem to be a threshold effect: approximately > > log10(time[seconds]) ~ -8 - log10(-size) > > over the range from 1e-6 to 1e-9 > > > ff <- function(x) { > system.time(qnbinom(0.5, mu=3, size=10^x))[["elapsed"]] > } > svec <- seq(-5,-9,by=-0.2) > res <- lapply(svec, function(x) { > cat(x,"\n") > replicate(10,ff(x)) > }) > > dd <- data.frame(size=rep(svec,each=10), > time=unlist(res)) > boxplot(log10(time)~size, dd) > summary(lm(log10(time)~size, data=dd, subset=time>0)) > > > > > On 8/7/20 2:01 PM, Constantin Ahlmann-Eltze via R-devel wrote: > > > Hi all, > > > > I recently noticed that `qnbinom()` can take a long time to calculate > > a result if the `size` argument is very small. > > For example > > qnbinom(0.5, mu = 3, size = 1e-10) > > takes ~30 seconds on my computer. > > > > I used gdb to step through the qnbinom.c implementation and noticed > > that in line 106 > > (https://github.com/wch/r-source/blob/f8d4d7d48051860cc695b99db9be9cf439aee743/src/nmath/qnbinom.c#L106) > > `y` becomes a very large negative number. Later in the function `y` is > > (as far as I can see) only used as input for `pnbinom()` which is why > > I would assume that it should be a non-negative integer. > > > > I was wondering if this behavior could be considered a bug and should > > be reported on the bugzilla? I read the instructions at > > https://www.r-project.org/bugs.html and wasn't quite sure, so I > > decided to ask here first :) > > > > Best, > > Constantin > > > > > > > > > > PS: I tested the code with R 4.0.0 on macOS and the latest unstable > > version using docker (https://github.com/wch/r-debug). The session > > info is > >> sessionInfo() > > R Under development (unstable) (2020-08-06 r78973) > > Platform: x86_64-pc-linux-gnu (64-bit) > > Running under: Ubuntu 20.04 LTS > > > > Matrix products: default > > BLAS: /usr/local/RD/lib/R/lib/libRblas.so > > LAPACK: /usr/local/RD/lib/R/lib/libRlapack.so > > > > locale: > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 > > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > loaded via a namespace (and not attached): > > [1] compiler_4.1.0 > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] lm() takes weights from formula environment
I wish I had started with "I am disappointed that lm() doesn't continue its search for weights into the calling environment" or "the fact that lm() looks only in the formula environment and data frame for weights doesn't seem consistent with how other values are treated." But I did not. So I do apologize for both that and for negative tone on my part. Simplified example: d <- data.frame(x = 1:3, y = c(1, 2, 1)) w <- c(1, 10, 1) f <- as.formula(y ~ x) lm(f, data = d, weights = w) # works # fails environment(f) <- baseenv() lm(f, data = d, weights = w) # Error in eval(extras, data, env) : object 'w' not found > On Aug 9, 2020, at 11:56 AM, Duncan Murdoch wrote: > > This is fairly clearly documented in ?lm: > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] lm() takes weights from formula environment
I assume you are concerned about this because the formula is defined in one environment and the model fitting with weights occurs in a separate function. If that is the case then the model fitting function can create a new environment, a child of the formula's environment, add the weights variable to it, and make that the new environment of the formula. (This new environment is only an attribute of the copy of the formula in the model fitting function: it will not affect the formula outside of that function.) E.g., d <- data.frame(x = 1:3, y = c(1, 2, 1)) lmWithWeightsBad <- function(formula, data, weights) { lm(formula, data=data, weights=weights) } coef(lmWithWeightsBad(y~x, data=d, weights=c(2,5,1))) # lm finds the 'weights' function in package:stats #Error in model.frame.default(formula = formula, data = data, weights = weights, : # invalid type (closure) for variable '(weights)' lmWithWeightsGood <- function(formula, data, weights) { envir <- new.env(parent = environment(formula)) envir$weights <- weights environment(formula) <- envir lm(formula, data=data, weights=weights) } coef(lmWithWeightsGood(y~x, data=d, weights=c(2,5,1))) #(Intercept) x # 1.2173913 0.2173913 Bill Dunlap TIBCO Software wdunlap tibco.com On Mon, Aug 10, 2020 at 10:43 AM John Mount wrote: > > I wish I had started with "I am disappointed that lm() doesn't continue its > search for weights into the calling environment" or "the fact that lm() looks > only in the formula environment and data frame for weights doesn't seem > consistent with how other values are treated." > > But I did not. So I do apologize for both that and for negative tone on my > part. > > > Simplified example: > > d <- data.frame(x = 1:3, y = c(1, 2, 1)) > w <- c(1, 10, 1) > f <- as.formula(y ~ x) > lm(f, data = d, weights = w) # works > > # fails > environment(f) <- baseenv() > lm(f, data = d, weights = w) > # Error in eval(extras, data, env) : object 'w' not found > > > > On Aug 9, 2020, at 11:56 AM, Duncan Murdoch > > wrote: > > > > This is fairly clearly documented in ?lm: > > > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] lm() takes weights from formula environment
On 10/08/2020 1:42 p.m., John Mount wrote: I wish I had started with "I am disappointed that lm() doesn't continue its search for weights into the calling environment" or "the fact that lm() looks only in the formula environment and data frame for weights doesn't seem consistent with how other values are treated." Normally searching is done automatically by following a chain of environments. It's easy to add something to the head of the chain (e.g. data), it's hard to add something in the middle or at the end (because the chain ends with emptyenv(), which is not allowed to have a parent). So I'd suggest using environment(f) <- environment() before calling lm() if you want the calling environment to be in the search. Setting it to baseenv() doesn't really make sense, unless you want to disable all searches except in data, in which case emptyenv() would make more sense (but I haven't tried it, so it might break something). Duncan Murdoch But I did not. So I do apologize for both that and for negative tone on my part. Simplified example: d <- data.frame(x = 1:3, y = c(1, 2, 1)) w <- c(1, 10, 1) f <- as.formula(y ~ x) lm(f, data = d, weights = w) # works # fails environment(f) <- baseenv() lm(f, data = d, weights = w) # Error in eval(extras, data, env) : object 'w' not found On Aug 9, 2020, at 11:56 AM, Duncan Murdoch wrote: This is fairly clearly documented in ?lm: __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] lm() takes weights from formula environment
Thank you for your suggestion. I do know how to work around the issue. I usually build a fresh environment as a child of base-environment and then insurt the weights there. I was just trying to provide an example of the issue. emptyenv() can not be used, as it is needed for the eval (errors out even if weights are not used with "could not find function list"). For some applications one doesn't want the formula to have a non-trivial environment with respect to serialization. Nina Zumel wrote about reference leaks in lm()/glm() and a good part of that was environments other than global/base (such as those formed when building a formula in a function) capturing references to unrelated structures. > On Aug 10, 2020, at 11:34 AM, Duncan Murdoch wrote: > > On 10/08/2020 1:42 p.m., John Mount wrote: >> I wish I had started with "I am disappointed that lm() doesn't continue its >> search for weights into the calling environment" or "the fact that lm() >> looks only in the formula environment and data frame for weights doesn't >> seem consistent with how other values are treated." > > Normally searching is done automatically by following a chain of > environments. It's easy to add something to the head of the chain (e.g. > data), it's hard to add something in the middle or at the end (because the > chain ends with emptyenv(), which is not allowed to have a parent). > > So I'd suggest using > > environment(f) <- environment() > > before calling lm() if you want the calling environment to be in the search. > Setting it to baseenv() doesn't really make sense, unless you want to disable > all searches except in data, in which case emptyenv() would make more sense > (but I haven't tried it, so it might break something). > > Duncan Murdoch > >> But I did not. So I do apologize for both that and for negative tone on my >> part. >> Simplified example: >> d <- data.frame(x = 1:3, y = c(1, 2, 1)) >> w <- c(1, 10, 1) >> f <- as.formula(y ~ x) >> lm(f, data = d, weights = w) # works >> # fails >> environment(f) <- baseenv() >> lm(f, data = d, weights = w) >> # Error in eval(extras, data, env) : object 'w' not found >>> On Aug 9, 2020, at 11:56 AM, Duncan Murdoch >>> wrote: >>> >>> This is fairly clearly documented in ?lm: >>> > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] lm() takes weights from formula environment
Forgot the url: https://win-vector.com/2014/05/30/trimming-the-fat-from-glm-models-in-r/ On Aug 10, 2020, at 11:50 AM, John Mount mailto:jmo...@win-vector.com>> wrote: Thank you for your suggestion. I do know how to work around the issue. I usually build a fresh environment as a child of base-environment and then insurt the weights there. I was just trying to provide an example of the issue. emptyenv() can not be used, as it is needed for the eval (errors out even if weights are not used with "could not find function list"). For some applications one doesn't want the formula to have a non-trivial environment with respect to serialization. Nina Zumel wrote about reference leaks in lm()/glm() and a good part of that was environments other than global/base (such as those formed when building a formula in a function) capturing references to unrelated structures. On Aug 10, 2020, at 11:34 AM, Duncan Murdoch mailto:murdoch.dun...@gmail.com>> wrote: On 10/08/2020 1:42 p.m., John Mount wrote: I wish I had started with "I am disappointed that lm() doesn't continue its search for weights into the calling environment" or "the fact that lm() looks only in the formula environment and data frame for weights doesn't seem consistent with how other values are treated." Normally searching is done automatically by following a chain of environments. It's easy to add something to the head of the chain (e.g. data), it's hard to add something in the middle or at the end (because the chain ends with emptyenv(), which is not allowed to have a parent). So I'd suggest using environment(f) <- environment() before calling lm() if you want the calling environment to be in the search. Setting it to baseenv() doesn't really make sense, unless you want to disable all searches except in data, in which case emptyenv() would make more sense (but I haven't tried it, so it might break something). Duncan Murdoch But I did not. So I do apologize for both that and for negative tone on my part. Simplified example: d <- data.frame(x = 1:3, y = c(1, 2, 1)) w <- c(1, 10, 1) f <- as.formula(y ~ x) lm(f, data = d, weights = w) # works # fails environment(f) <- baseenv() lm(f, data = d, weights = w) # Error in eval(extras, data, env) : object 'w' not found On Aug 9, 2020, at 11:56 AM, Duncan Murdoch mailto:murdoch.dun...@gmail.com>> wrote: This is fairly clearly documented in ?lm: [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] M[cbind()] <- assignment with Matrix object loses attributes
Does this constitute a bug, or is there something I'm missing? assigning sub-elements of a sparse Matrix via M[X]<-..., where X is a 2-column matrix, appears to drop user-assigned attributes. I dug around in the R code for Matrix trying to find the relevant machinery but my brain started to hurt too badly ... Will submit this as a bug if it seems warranted. library(Matrix) m1 <- matrix(1:9,3,3) m1 <- Matrix(m1) attr(m1,"junk") <- 12 stopifnot(isTRUE(attr(m1,"junk")==12)) ## OK m1[cbind(1:2,2:3)] <- 1 stopifnot(isTRUE(attr(m1,"junk")==12)) ## not OK attr(m1,"junk") ## NULL ## note I have to use the ugly stopifnot(isTRUE(...)) because a missing attribute returns NULL, an assignment to NULL returns NULL, and stopifnot(NULL) doesn't stop ... cheers Ben Bolker __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel