[Rd] naresid.exclude query
x <- NA na.act <- na.action(na.exclude(x)) y <- rep(0,0) naresid(na.act,y) ... currently produces the result... numeric(0) ... whereas the documentation might lead you to expect NA The behaviour is caused by the line if (length(x) == 0L) return(x) in `stats:::naresid.exclude'. Removing this line results in the behaviour I'd expected in the above example (and in a test example where `x' is a zero row matrix). Is the coded behaviour necessary for some reason? Could it be changed (so that my example produces NA)? The reason I ask is that I use `napredict' in mgcv:predict.gam, and someone complained that if he predicts using newdata that is all NA, then he doesn't get what he expected (he has a pretty good reason for doing this). Part of the problem with predict.gam in this case was my code, but once I fixed that I ran up against the above problem. Actually, I just checked the source code and the line of naresid.exclude that causes the problem already has this comment after it # << FIXME? -- reconstructing all NA object ... so I guess I'm really asking if there is any chance of fixing this soon, or whether I should just code up a work-around for predict.gam? Simon -- > Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK > +44 1225 386603 www.maths.bath.ac.uk/~sw283 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] glm(...,quasi(link="logit")) problem?
There may be a good reason for this, but under R 2.5.1 (Windows XP, Suse Linux) > glm(c(0,.5)~1,quasi(link=logit)) Error: NA/NaN/Inf in foreign function call (arg 4) whereas the same code works under R 2.1.1. The problem (in 2.5.1) is a -Inf in the pseudodata `z' in glm.fit, as a result of a `-Inf' in eta, which is in turn generated by `mustart <- y' and `logit(mustart)'. In 2.1.1 the line `mustart <- y + 0.1 * (y == 0)' is what avoids the problem. A possible fix might be to set elements of `good' to FALSE for non-finite `eta', for the first fit iteration only, of course. Is it worth fixing? I found this after someone reported a problem with mgcv::gamm (which calls MASS:glmmPQL which calls glm), which is why I'm not just supplying my own mustart and getting on with it... best, Simon >- Simon Wood, Mathematical Sciences, University of Bath, Bath BA2 7AY >- +44 (0)1225 386603 www.maths.bath.ac.uk/~sw283/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] get_all_vars fails with matrices (PR#13624)
It's not just lists that are odd... > X <- matrix(1:15,5,3) > z <- 26:30 > model.frame(~z+X) z X.1 X.2 X.3 1 26 1 6 11 2 27 2 7 12 3 28 3 8 13 4 29 4 9 14 5 30 5 10 15 > get_all_vars(~z+X) [1] zX ... which is again a problem when trying to pick up unprocessed versions of the variables used by a modelling function On Wednesday 25 March 2009 18:54, Peter Dalgaard wrote: > s.w...@bath.ac.uk wrote: > > Hi, > > > > According to the help file for model.frame/get_all_vars, the following > > should produce the same output from both functions, but it doesn't... > > > >> dat <- list(X=matrix(1:15,5,3),z=26:30) > >> model.frame(~z+X,dat) > > > >z X.1 X.2 X.3 > > 1 26 1 6 11 > > 2 27 2 7 12 > > 3 28 3 8 13 > > 4 29 4 9 14 > > 5 30 5 10 15 > > > >> get_all_vars(~z+X,dat) > > > > [1] zX > > <0 rows> (or 0-length row.names) > > > > -- the equivalent works ok if there are no matrices involved. > > > > I'm using R version 2.9.0 alpha (2009-03-24 r48212) (Suse linux 10 and > > 11, 64 bit intel). I found the problem while trying to fix a problem in > > an mgcv plotting routine. > > > > best, > > Simon > > This works, though: > > dat <- data.frame(X=I(matrix(1:15,5,3)),z=26:30) > > get_all_vars(~z+X,dat) > > z X.1 X.2 X.3 > 1 26 1 6 11 > 2 27 2 7 12 > 3 28 3 8 13 > 4 29 4 9 14 > 5 30 5 10 15 > > but there is something special with lists: > > dat <- as.data.frame(list(X=I(matrix(1:15,5,3)),z=26:30)) > > get_all_vars(~z+X,dat) > > z X.1 X.2 X.3 > 1 26 1 6 11 > 2 27 2 7 12 > 3 28 3 8 13 > 4 29 4 9 14 > 5 30 5 10 15 > > > dat <- data.frame(list(X=I(matrix(1:15,5,3)),z=26:30)) > > get_all_vars(~z+X,dat) > > z X.1 X.2 X.3 > 1 26 1 6 11 > 2 27 2 7 12 > 3 28 3 8 13 > 4 29 4 9 14 > 5 30 5 10 15 > > > dat <- list(X=I(matrix(1:15,5,3)),z=26:30) > > get_all_vars(~z+X,dat) > > [1] z X > <0 rows> (or 0-length row.names) -- > Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK > +44 1225 386603 www.maths.bath.ac.uk/~sw283 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] gaussian family change suggestion
Hi, Currently the `gaussian' family's initialization code signals an error if any response data are zero or negative and a log link is used. Given that zero or negative response data are perfectly legitimate under the GLM fitted using `gaussian("log")', this seems a bit unsatisfactory. Might it be worth changing it? The current offending code from `gaussian' is: initialize = expression({ n <- rep.int(1, nobs) if (is.null(etastart) && is.null(start) && is.null(mustart) && ((family$link == "inverse" && any(y == 0)) || (family$link == "log" && any(y <= 0 stop( "cannot find valid starting values: please specify some") mustart <- y }) A possible replacement would be ... initialize = expression({ n <- rep.int(1, nobs) if (is.null(etastart) && is.null(start) && is.null(mustart) && ((family$link == "inverse" && all(y == 0)) || (family$link == "log" && all(y <= 0 stop( "cannot find valid starting values: please specify some") mustart <- y if (family$link=="log") { iy <- y<=0 if (sum(iy)) mustart[iy] <- min(y[!iy])*.5 } else if (family$link=="inverse") { iy <- y==0 if (sum(iy)) mustart[iy] <- min(abs(y[!iy]))*.5 } }) best, Simon >- Simon Wood, Mathematical Sciences, University of Bath, Bath BA2 7AY >- +44 (0)1225 386603 www.maths.bath.ac.uk/~sw283/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] gaussian family change suggestion
> > Currently the `gaussian' family's initialization code signals an error if > > any response data are zero or negative and a log link is used. Given that > > zero or negative response data are perfectly legitimate under the GLM > > fitted using `gaussian("log")', this seems a bit unsatisfactory. Might > > it be worth changing it? > > Well, that's not really true: it says it is unable to find starting > values, and asks you to provide some. I don't really see why your choices > are satisfactory: what happens if all the data are negative for example? - then the suggested code traps it. > > We could try even harder, but code that is almost never used tends to get > broken whilst no one is testing it. So if you want to pursue it I think > we need a comprehensive test suite. - well, I guess, but I wouldn't have thought that zeros in data modelled using `gaussian("log")' is such a rare occurance is it? [Or did you mean that `gaussian("log")' is almost never used, and should hence be kept simple]. I suppose there are arguments both ways... best, Simon > > > The current offending code from `gaussian' is: > > > > initialize = expression({ > >n <- rep.int(1, nobs) > >if (is.null(etastart) && is.null(start) && is.null(mustart) && > >((family$link == "inverse" && any(y == 0)) || > > (family$link == "log" && any(y <= 0 stop( > > "cannot find valid starting values: please specify some") > >mustart <- y > >}) > > > > A possible replacement would be ... > > > > initialize = expression({ > >n <- rep.int(1, nobs) > >if (is.null(etastart) && is.null(start) && is.null(mustart) && > >((family$link == "inverse" && all(y == 0)) || > > (family$link == "log" && all(y <= 0 stop( > > "cannot find valid starting values: please specify some") > >mustart <- y > > if (family$link=="log") { > >iy <- y<=0 > >if (sum(iy)) mustart[iy] <- min(y[!iy])*.5 > > } else if (family$link=="inverse") { > >iy <- y==0 > >if (sum(iy)) mustart[iy] <- min(abs(y[!iy]))*.5 > > } > >}) > > > > best, > > Simon > > > > > >> - Simon Wood, Mathematical Sciences, University of Bath, Bath BA2 7AY > >> - +44 (0)1225 386603 www.maths.bath.ac.uk/~sw283/ > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > > -- > Brian D. Ripley, [EMAIL PROTECTED] > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UKFax: +44 1865 272595 > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] gaussian family change suggestion
> >>> Currently the `gaussian' family's initialization code signals an error if > >>> any response data are zero or negative and a log link is used. Given that > >>> zero or negative response data are perfectly legitimate under the GLM > >>> fitted using `gaussian("log")', this seems a bit unsatisfactory. Might > >>> it be worth changing it? > >> > >> Well, that's not really true: it says it is unable to find starting > >> values, and asks you to provide some. I don't really see why your choices > >> are satisfactory: what happens if all the data are negative for example? > > > > - then the suggested code traps it. > > Yes, but why (it can happen too)? - isn't this (i.e. all response <=0 ) a case where the MLE's of the linear predictor must tend to minus infinity? So there's probably an argument for handling it differently [although my proposed error message is not the most informative, since modifying the starting values won't help]. best, Simon > > >> We could try even harder, but code that is almost never used tends to get > >> broken whilst no one is testing it. So if you want to pursue it I think > >> we need a comprehensive test suite. > > > > - well, I guess, but I wouldn't have thought that zeros in data modelled > > using `gaussian("log")' is such a rare occurance is it? [Or did you mean > > that `gaussian("log")' is almost never used, and should hence be kept > > simple]. > > I meant initialization code for this case would rarely get used, as I > suppose those people who do this are used to providing starting values > (or there are few or no such people). > > > > > I suppose there are arguments both ways... > > > > best, > > Simon > > > >> > >>> The current offending code from `gaussian' is: > >>> > >>> initialize = expression({ > >>>n <- rep.int(1, nobs) > >>>if (is.null(etastart) && is.null(start) && is.null(mustart) && > >>>((family$link == "inverse" && any(y == 0)) || > >>> (family$link == "log" && any(y <= 0 stop( > >>> "cannot find valid starting values: please specify some") > >>>mustart <- y > >>>}) > >>> > >>> A possible replacement would be ... > >>> > >>> initialize = expression({ > >>>n <- rep.int(1, nobs) > >>>if (is.null(etastart) && is.null(start) && is.null(mustart) && > >>>((family$link == "inverse" && all(y == 0)) || > >>> (family$link == "log" && all(y <= 0 stop( > >>> "cannot find valid starting values: please specify some") > >>>mustart <- y > >>> if (family$link=="log") { > >>>iy <- y<=0 > >>>if (sum(iy)) mustart[iy] <- min(y[!iy])*.5 > >>> } else if (family$link=="inverse") { > >>>iy <- y==0 > >>>if (sum(iy)) mustart[iy] <- min(abs(y[!iy]))*.5 > >>> } > >>>}) > >>> > >>> best, > >>> Simon > >>> > >>> > >>>> - Simon Wood, Mathematical Sciences, University of Bath, Bath BA2 7AY > >>>> - +44 (0)1225 386603 www.maths.bath.ac.uk/~sw283/ > >>> > >>> __ > >>> R-devel@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-devel > >>> > >>> > >> > >> -- > >> Brian D. Ripley, [EMAIL PROTECTED] > >> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > >> University of Oxford, Tel: +44 1865 272861 (self) > >> 1 South Parks Road, +44 1865 272866 (PA) > >> Oxford OX1 3TG, UKFax: +44 1865 272595 > >> > > > > > > -- > Brian D. Ripley, [EMAIL PROTECTED] > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UKFax: +44 1865 272595 > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel