[Rd] Typos in file.path documentation.

2020-08-10 Thread Rui Barradas

Hello,

R 4.0.2 on Ubuntu 20.04, sessionInfo() below.

I believe there are two typos in ?file.path, section Value, 2nd paragraph.

1. There is a close parenthesis missing  after Encoding, as it is 
reading is a bit confusing, I had to backtrack and repeat.
2. I'm not a native language speaker but before a consonant it's 'a', 
not 'an', right?


an component

should be

a component


Current:

An element of the result will be marked (see Encoding as UTF-8 if run in 
a UTF-8 locale (when marked inputs are converted to UTF-8) or if an 
component of the result is marked as UTF-8, or as Latin-1 in a 
non-Latin-1 locale.


Should be:

An element of the result will be marked (see Encoding) as UTF-8 if run 
in a UTF-8 locale (when marked inputs are converted to UTF-8) or if a 
component of the result is marked as UTF-8, or as Latin-1 in a 
non-Latin-1 locale.



sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=pt_PT.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=pt_PT.UTF-8LC_COLLATE=pt_PT.UTF-8
 [5] LC_MONETARY=pt_PT.UTF-8LC_MESSAGES=pt_PT.UTF-8
 [7] LC_PAPER=pt_PT.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.0.2


Hope this helps,

Rui Barradas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] qnbinom with small size is slow

2020-08-10 Thread Constantin Ahlmann-Eltze via R-devel
Thanks Ben for verifying the issue. It is always reassuring to hear
when others can reproduce the problem.

I wrote a small patch that fixes the issue
(https://github.com/r-devel/r-svn/pull/11):

diff --git a/src/nmath/qnbinom.c b/src/nmath/qnbinom.c
index b313ce56b2..d2e8d98759 100644
--- a/src/nmath/qnbinom.c
+++ b/src/nmath/qnbinom.c
@@ -104,6 +104,7 @@ double qnbinom(double p, double size, double prob,
int lower_tail, int log_p)
 /* y := approx.value (Cornish-Fisher expansion) :  */
 z = qnorm(p, 0., 1., /*lower_tail*/TRUE, /*log_p*/FALSE);
 y = R_forceint(mu + sigma * (z + gamma * (z*z - 1) / 6));
+y = fmax2(0.0, y);

 z = pnbinom(y, size, prob, /*lower_tail*/TRUE, /*log_p*/FALSE);

I used the https://github.com/r-devel/r-svn repo and its continuous
integration tools to check that it doesn't break any existing tests:
https://github.com/r-devel/r-svn/actions/runs/201327042

I have also requested a Bugzilla-account, but haven't heard anything back yet.

Best,
Constantin

Am Fr., 7. Aug. 2020 um 21:41 Uhr schrieb Ben Bolker :
>
> I can reproduce this on
>
> R Under development (unstable) (2020-07-24 r78910)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Pop!_OS 18.04 LTS
>
>In my opinion this is worth reporting, but discussing it here first
> was a good idea.  Many more people read this list than watch the bug
> tracker, so it will get more attention here; once the excitement has
> died down here (which might be almost immediately!), if no-one has
> already volunteered to post it to the bug tracker, request an account
> (as specified at https://www.r-project.org/bugs.html )
>
>Thanks!
>
> Ben Bolker
>
>
> For what it's worth it doesn't seem to be a threshold effect: approximately
>
> log10(time[seconds]) ~ -8 - log10(-size)
>
> over the range from 1e-6 to 1e-9
>
>
> ff <- function(x) {
> system.time(qnbinom(0.5, mu=3, size=10^x))[["elapsed"]]
> }
> svec <- seq(-5,-9,by=-0.2)
> res <- lapply(svec, function(x) {
>  cat(x,"\n")
>  replicate(10,ff(x))
>  })
>
> dd <- data.frame(size=rep(svec,each=10),
>   time=unlist(res))
> boxplot(log10(time)~size, dd)
> summary(lm(log10(time)~size, data=dd, subset=time>0))
>
>
>
>
> On 8/7/20 2:01 PM, Constantin Ahlmann-Eltze via R-devel wrote:
>
> > Hi all,
> >
> > I recently noticed that `qnbinom()` can take a long time to calculate
> > a result if the `size` argument is very small.
> > For example
> > qnbinom(0.5, mu = 3, size = 1e-10)
> > takes ~30 seconds on my computer.
> >
> > I used gdb to step through the qnbinom.c implementation and noticed
> > that in line 106
> > (https://github.com/wch/r-source/blob/f8d4d7d48051860cc695b99db9be9cf439aee743/src/nmath/qnbinom.c#L106)
> > `y` becomes a very large negative number. Later in the function `y` is
> > (as far as I can see) only used as input for `pnbinom()` which is why
> > I would assume that it should be a non-negative integer.
> >
> > I was wondering if this behavior could be considered a bug and should
> > be reported on the bugzilla? I read the instructions at
> > https://www.r-project.org/bugs.html and wasn't quite sure, so I
> > decided to ask here first :)
> >
> > Best,
> > Constantin
> >
> >
> >
> >
> > PS: I tested the code with R 4.0.0 on macOS and the latest unstable
> > version using docker (https://github.com/wch/r-debug). The session
> > info is
> >> sessionInfo()
> > R Under development (unstable) (2020-08-06 r78973)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> > Running under: Ubuntu 20.04 LTS
> >
> > Matrix products: default
> > BLAS:   /usr/local/RD/lib/R/lib/libRblas.so
> > LAPACK: /usr/local/RD/lib/R/lib/libRlapack.so
> >
> > locale:
> >   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
> >   [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
> >   [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
> >   [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
> >   [9] LC_ADDRESS=C   LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats graphics  grDevices utils datasets  methods   base
> >
> > loaded via a namespace (and not attached):
> > [1] compiler_4.1.0
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] lm() takes weights from formula environment

2020-08-10 Thread John Mount
I wish I had started with "I am disappointed that lm() doesn't continue its 
search for weights into the calling environment" or "the fact that lm() looks 
only in the formula environment and data frame for weights doesn't seem 
consistent with how other values are treated."

But I did not. So I do apologize for both that and for negative tone on my part.


Simplified example:

d <- data.frame(x = 1:3, y = c(1, 2, 1))
w <- c(1, 10, 1)
f <- as.formula(y ~ x)
lm(f, data = d, weights = w)  # works

# fails
environment(f) <- baseenv()
lm(f, data = d, weights = w)
# Error in eval(extras, data, env) : object 'w' not found


> On Aug 9, 2020, at 11:56 AM, Duncan Murdoch  wrote:
> 
> This is fairly clearly documented in ?lm:
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] lm() takes weights from formula environment

2020-08-10 Thread William Dunlap via R-devel
I assume you are concerned about this because the formula is defined
in one environment and the model fitting with weights occurs in a
separate function.  If that is the case then the model fitting
function can create a new environment, a child of the formula's
environment, add the weights variable to it, and make that the new
environment of the formula.  (This new environment is only an
attribute of the copy of the formula in the model fitting function: it
will not affect the formula outside of that function.)  E.g.,


d <- data.frame(x = 1:3, y = c(1, 2, 1))

lmWithWeightsBad <- function(formula, data, weights) {
lm(formula, data=data, weights=weights)
}
coef(lmWithWeightsBad(y~x, data=d, weights=c(2,5,1))) # lm finds the
'weights' function in package:stats
#Error in model.frame.default(formula = formula, data = data, weights
= weights,  :
#  invalid type (closure) for variable '(weights)'

lmWithWeightsGood <- function(formula, data, weights) {
envir <- new.env(parent = environment(formula))
envir$weights <- weights
environment(formula) <- envir
lm(formula, data=data, weights=weights)
}
coef(lmWithWeightsGood(y~x, data=d, weights=c(2,5,1)))
#(Intercept)   x
#  1.2173913   0.2173913

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Aug 10, 2020 at 10:43 AM John Mount  wrote:
>
> I wish I had started with "I am disappointed that lm() doesn't continue its 
> search for weights into the calling environment" or "the fact that lm() looks 
> only in the formula environment and data frame for weights doesn't seem 
> consistent with how other values are treated."
>
> But I did not. So I do apologize for both that and for negative tone on my 
> part.
>
>
> Simplified example:
>
> d <- data.frame(x = 1:3, y = c(1, 2, 1))
> w <- c(1, 10, 1)
> f <- as.formula(y ~ x)
> lm(f, data = d, weights = w)  # works
>
> # fails
> environment(f) <- baseenv()
> lm(f, data = d, weights = w)
> # Error in eval(extras, data, env) : object 'w' not found
>
>
> > On Aug 9, 2020, at 11:56 AM, Duncan Murdoch  
> > wrote:
> >
> > This is fairly clearly documented in ?lm:
> >
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] lm() takes weights from formula environment

2020-08-10 Thread Duncan Murdoch

On 10/08/2020 1:42 p.m., John Mount wrote:

I wish I had started with "I am disappointed that lm() doesn't continue its search for weights 
into the calling environment" or "the fact that lm() looks only in the formula 
environment and data frame for weights doesn't seem consistent with how other values are 
treated."


Normally searching is done automatically by following a chain of 
environments.  It's easy to add something to the head of the chain (e.g. 
data), it's hard to add something in the middle or at the end (because 
the chain ends with emptyenv(), which is not allowed to have a parent).


So I'd suggest using

 environment(f) <- environment()

before calling lm() if you want the calling environment to be in the 
search.  Setting it to baseenv() doesn't really make sense, unless you 
want to disable all searches except in data, in which case emptyenv() 
would make more sense (but I haven't tried it, so it might break something).


Duncan Murdoch



But I did not. So I do apologize for both that and for negative tone on my part.


Simplified example:

d <- data.frame(x = 1:3, y = c(1, 2, 1))
w <- c(1, 10, 1)
f <- as.formula(y ~ x)
lm(f, data = d, weights = w)  # works

# fails
environment(f) <- baseenv()
lm(f, data = d, weights = w)
# Error in eval(extras, data, env) : object 'w' not found



On Aug 9, 2020, at 11:56 AM, Duncan Murdoch  wrote:

This is fairly clearly documented in ?lm:





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] lm() takes weights from formula environment

2020-08-10 Thread John Mount
Thank you for your suggestion. I do know how to work around the issue.  I 
usually build a fresh environment as a child of base-environment and then 
insurt the weights there. I was just trying to provide an example of the issue.

emptyenv() can not be used, as it is needed for the eval (errors out even if 
weights are not used with "could not find function list").

For some applications one doesn't want the formula to have a non-trivial 
environment with respect to serialization.  Nina Zumel wrote about reference 
leaks in lm()/glm() and a good part of that was environments other than 
global/base (such as those formed when building a formula in a function) 
capturing references to unrelated structures.



> On Aug 10, 2020, at 11:34 AM, Duncan Murdoch  wrote:
> 
> On 10/08/2020 1:42 p.m., John Mount wrote:
>> I wish I had started with "I am disappointed that lm() doesn't continue its 
>> search for weights into the calling environment" or "the fact that lm() 
>> looks only in the formula environment and data frame for weights doesn't 
>> seem consistent with how other values are treated."
> 
> Normally searching is done automatically by following a chain of 
> environments.  It's easy to add something to the head of the chain (e.g. 
> data), it's hard to add something in the middle or at the end (because the 
> chain ends with emptyenv(), which is not allowed to have a parent).
> 
> So I'd suggest using
> 
> environment(f) <- environment()
> 
> before calling lm() if you want the calling environment to be in the search.  
> Setting it to baseenv() doesn't really make sense, unless you want to disable 
> all searches except in data, in which case emptyenv() would make more sense 
> (but I haven't tried it, so it might break something).
> 
> Duncan Murdoch
> 
>> But I did not. So I do apologize for both that and for negative tone on my 
>> part.
>> Simplified example:
>> d <- data.frame(x = 1:3, y = c(1, 2, 1))
>> w <- c(1, 10, 1)
>> f <- as.formula(y ~ x)
>> lm(f, data = d, weights = w)  # works
>> # fails
>> environment(f) <- baseenv()
>> lm(f, data = d, weights = w)
>> # Error in eval(extras, data, env) : object 'w' not found
>>> On Aug 9, 2020, at 11:56 AM, Duncan Murdoch  
>>> wrote:
>>> 
>>> This is fairly clearly documented in ?lm:
>>> 
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] lm() takes weights from formula environment

2020-08-10 Thread John Mount
Forgot the url: 
https://win-vector.com/2014/05/30/trimming-the-fat-from-glm-models-in-r/

On Aug 10, 2020, at 11:50 AM, John Mount 
mailto:jmo...@win-vector.com>> wrote:

Thank you for your suggestion. I do know how to work around the issue.  I 
usually build a fresh environment as a child of base-environment and then 
insurt the weights there. I was just trying to provide an example of the issue.

emptyenv() can not be used, as it is needed for the eval (errors out even if 
weights are not used with "could not find function list").

For some applications one doesn't want the formula to have a non-trivial 
environment with respect to serialization.  Nina Zumel wrote about reference 
leaks in lm()/glm() and a good part of that was environments other than 
global/base (such as those formed when building a formula in a function) 
capturing references to unrelated structures.



On Aug 10, 2020, at 11:34 AM, Duncan Murdoch 
mailto:murdoch.dun...@gmail.com>> wrote:

On 10/08/2020 1:42 p.m., John Mount wrote:
I wish I had started with "I am disappointed that lm() doesn't continue its 
search for weights into the calling environment" or "the fact that lm() looks 
only in the formula environment and data frame for weights doesn't seem 
consistent with how other values are treated."

Normally searching is done automatically by following a chain of environments.  
It's easy to add something to the head of the chain (e.g. data), it's hard to 
add something in the middle or at the end (because the chain ends with 
emptyenv(), which is not allowed to have a parent).

So I'd suggest using

environment(f) <- environment()

before calling lm() if you want the calling environment to be in the search.  
Setting it to baseenv() doesn't really make sense, unless you want to disable 
all searches except in data, in which case emptyenv() would make more sense 
(but I haven't tried it, so it might break something).

Duncan Murdoch

But I did not. So I do apologize for both that and for negative tone on my part.
Simplified example:
d <- data.frame(x = 1:3, y = c(1, 2, 1))
w <- c(1, 10, 1)
f <- as.formula(y ~ x)
lm(f, data = d, weights = w)  # works
# fails
environment(f) <- baseenv()
lm(f, data = d, weights = w)
# Error in eval(extras, data, env) : object 'w' not found
On Aug 9, 2020, at 11:56 AM, Duncan Murdoch 
mailto:murdoch.dun...@gmail.com>> wrote:

This is fairly clearly documented in ?lm:





[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] M[cbind()] <- assignment with Matrix object loses attributes

2020-08-10 Thread Ben Bolker
  Does this constitute a bug, or is there something I'm missing? 
assigning sub-elements of a sparse Matrix via M[X]<-..., where X is a 
2-column matrix, appears to drop user-assigned attributes. I dug around 
in the R code for Matrix trying to find the relevant machinery but my 
brain started to hurt too badly ...


   Will submit this as a bug if it seems warranted.

library(Matrix)
m1 <- matrix(1:9,3,3)
m1 <- Matrix(m1)
attr(m1,"junk") <- 12
stopifnot(isTRUE(attr(m1,"junk")==12))  ## OK
m1[cbind(1:2,2:3)] <- 1
stopifnot(isTRUE(attr(m1,"junk")==12)) ## not OK
attr(m1,"junk") ## NULL


## note I have to use the ugly stopifnot(isTRUE(...)) because a missing 
attribute returns NULL, an assignment to NULL returns NULL, and 
stopifnot(NULL) doesn't stop ...



   cheers

 Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel