[Rd] naresid.exclude query

2011-01-14 Thread Simon Wood
  x <- NA
  na.act <- na.action(na.exclude(x))
  y <- rep(0,0)
  naresid(na.act,y)

... currently produces the result...
  numeric(0)

... whereas the documentation might lead you to expect
  NA

The behaviour is caused by the line 
  if (length(x) == 0L) return(x)

in `stats:::naresid.exclude'. Removing this line results in the behaviour I'd 
expected in the above example (and in a test example where `x' is a zero row 
matrix). 

Is the coded behaviour necessary for some reason? Could it be changed (so that 
my example produces NA)? The reason I ask is that I use `napredict' in 
mgcv:predict.gam, and someone complained that if he predicts using newdata 
that is all NA, then he doesn't get what he expected (he has a pretty good 
reason for doing this). Part of the problem with predict.gam in this case was 
my code, but once I fixed that I ran up against the above problem.  

Actually, I just checked the source code and the line of naresid.exclude that 
causes the problem already has this comment after it
  # << FIXME? -- reconstructing all NA object
... so I guess I'm really asking if there is any chance of fixing this soon, 
or whether I should just code up a work-around for predict.gam?

Simon

-- 
> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> +44 1225 386603  www.maths.bath.ac.uk/~sw283

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] glm(...,quasi(link="logit")) problem?

2007-07-25 Thread Simon Wood
There may be a good reason for this, but under R 2.5.1 (Windows XP, Suse 
Linux)

> glm(c(0,.5)~1,quasi(link=logit))
Error: NA/NaN/Inf in foreign function call (arg 4)

whereas the same code works under R 2.1.1.

The problem (in 2.5.1) is a -Inf in the pseudodata `z' in glm.fit, as a 
result of a `-Inf' in eta, which is in turn generated by `mustart <- y' and 
`logit(mustart)'. In 2.1.1 the line `mustart <- y + 0.1 * (y == 0)' is 
what avoids the problem.

A possible fix might be to set elements of `good' to FALSE for non-finite 
`eta', for the first fit iteration only, of course. Is it worth fixing? I 
found this after  someone reported a problem with mgcv::gamm (which 
calls MASS:glmmPQL which calls glm), which is why I'm not just supplying 
my own mustart and getting on with it...

best,
Simon


>- Simon Wood, Mathematical Sciences, University of Bath, Bath BA2 7AY
>- +44 (0)1225 386603 www.maths.bath.ac.uk/~sw283/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] get_all_vars fails with matrices (PR#13624)

2009-03-26 Thread Simon Wood
It's not just lists that are odd...

> X <- matrix(1:15,5,3)
> z <- 26:30
> model.frame(~z+X)
   z X.1 X.2 X.3
1 26   1   6  11
2 27   2   7  12
3 28   3   8  13
4 29   4   9  14
5 30   5  10  15
> get_all_vars(~z+X)
[1] zX 

... which is again a problem when trying to pick up unprocessed versions of 
the variables used by a modelling function   


On Wednesday 25 March 2009 18:54, Peter Dalgaard wrote:
> s.w...@bath.ac.uk wrote:
> > Hi,
> >
> > According to the help file for model.frame/get_all_vars, the following
> > should produce the same output from both functions, but it doesn't...
> >
> >> dat <- list(X=matrix(1:15,5,3),z=26:30)
> >> model.frame(~z+X,dat)
> >
> >z X.1 X.2 X.3
> > 1 26   1   6  11
> > 2 27   2   7  12
> > 3 28   3   8  13
> > 4 29   4   9  14
> > 5 30   5  10  15
> >
> >> get_all_vars(~z+X,dat)
> >
> > [1] zX 
> > <0 rows> (or 0-length row.names)
> >
> > -- the equivalent works ok if there are no matrices involved.
> >
> > I'm using  R version 2.9.0 alpha (2009-03-24 r48212) (Suse linux 10 and
> > 11, 64 bit intel). I found the problem while trying to fix a problem in
> > an mgcv plotting routine.
> >
> > best,
> > Simon
>
> This works, though:
>  > dat <- data.frame(X=I(matrix(1:15,5,3)),z=26:30)
>  > get_all_vars(~z+X,dat)
>
> z X.1 X.2 X.3
> 1 26   1   6  11
> 2 27   2   7  12
> 3 28   3   8  13
> 4 29   4   9  14
> 5 30   5  10  15
>
> but there is something special with lists:
>  > dat <- as.data.frame(list(X=I(matrix(1:15,5,3)),z=26:30))
>  > get_all_vars(~z+X,dat)
>
> z X.1 X.2 X.3
> 1 26   1   6  11
> 2 27   2   7  12
> 3 28   3   8  13
> 4 29   4   9  14
> 5 30   5  10  15
>
>  > dat <- data.frame(list(X=I(matrix(1:15,5,3)),z=26:30))
>  > get_all_vars(~z+X,dat)
>
> z X.1 X.2 X.3
> 1 26   1   6  11
> 2 27   2   7  12
> 3 28   3   8  13
> 4 29   4   9  14
> 5 30   5  10  15
>
>  > dat <- list(X=I(matrix(1:15,5,3)),z=26:30)
>  > get_all_vars(~z+X,dat)
>
> [1] z X
> <0 rows> (or 0-length row.names)

-- 
> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> +44 1225 386603  www.maths.bath.ac.uk/~sw283

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] gaussian family change suggestion

2006-04-11 Thread Simon Wood
Hi,

Currently the `gaussian' family's initialization code signals an error if
any response data are zero or negative and a log link is used. Given that
zero or negative response data are perfectly legitimate under the GLM
fitted using `gaussian("log")', this seems a bit unsatisfactory. Might
it be worth changing it?

The current offending code from `gaussian' is:

initialize = expression({
n <- rep.int(1, nobs)
if (is.null(etastart) && is.null(start) && is.null(mustart) &&
((family$link == "inverse" && any(y == 0)) ||
  (family$link == "log" && any(y <= 0 stop(
  "cannot find valid starting values: please specify some")
mustart <- y
})

A possible replacement would be ...

initialize = expression({
n <- rep.int(1, nobs)
if (is.null(etastart) && is.null(start) && is.null(mustart) &&
((family$link == "inverse" && all(y == 0)) ||
  (family$link == "log" && all(y <= 0 stop(
  "cannot find valid starting values: please specify some")
mustart <- y
  if (family$link=="log") {
iy <- y<=0
    if (sum(iy)) mustart[iy] <- min(y[!iy])*.5
  } else if (family$link=="inverse") {
iy <- y==0
if (sum(iy)) mustart[iy] <- min(abs(y[!iy]))*.5
  }
})

best,
Simon


>- Simon Wood, Mathematical Sciences, University of Bath, Bath BA2 7AY
>- +44 (0)1225 386603 www.maths.bath.ac.uk/~sw283/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] gaussian family change suggestion

2006-04-11 Thread Simon Wood
> > Currently the `gaussian' family's initialization code signals an error if
> > any response data are zero or negative and a log link is used. Given that
> > zero or negative response data are perfectly legitimate under the GLM
> > fitted using `gaussian("log")', this seems a bit unsatisfactory. Might
> > it be worth changing it?
>
> Well, that's not really true: it says it is unable to find starting
> values, and asks you to provide some.  I don't really see why your choices
> are satisfactory: what happens if all the data are negative for example?

- then the suggested code traps it.

>
> We could try even harder, but code that is almost never used tends to get
> broken whilst no one is testing it.  So if you want to pursue it I think
> we need a comprehensive test suite.

- well, I guess, but I wouldn't have thought that zeros in data modelled
using `gaussian("log")' is such a rare occurance is it? [Or did you mean
that `gaussian("log")' is almost never used, and should hence be kept
simple].

I suppose there are arguments both ways...

best,
Simon

>
> > The current offending code from `gaussian' is:
> >
> > initialize = expression({
> >n <- rep.int(1, nobs)
> >if (is.null(etastart) && is.null(start) && is.null(mustart) &&
> >((family$link == "inverse" && any(y == 0)) ||
> >  (family$link == "log" && any(y <= 0 stop(
> >  "cannot find valid starting values: please specify some")
> >mustart <- y
> >})
> >
> > A possible replacement would be ...
> >
> > initialize = expression({
> >n <- rep.int(1, nobs)
> >if (is.null(etastart) && is.null(start) && is.null(mustart) &&
> >((family$link == "inverse" && all(y == 0)) ||
> >  (family$link == "log" && all(y <= 0 stop(
> >  "cannot find valid starting values: please specify some")
> >mustart <- y
> >  if (family$link=="log") {
> >iy <- y<=0
> >if (sum(iy)) mustart[iy] <- min(y[!iy])*.5
> >  } else if (family$link=="inverse") {
> >iy <- y==0
> >if (sum(iy)) mustart[iy] <- min(abs(y[!iy]))*.5
> >  }
> >})
> >
> > best,
> > Simon
> >
> >
> >> - Simon Wood, Mathematical Sciences, University of Bath, Bath BA2 7AY
> >> - +44 (0)1225 386603 www.maths.bath.ac.uk/~sw283/
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
>
> --
> Brian D. Ripley,  [EMAIL PROTECTED]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel:  +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UKFax:  +44 1865 272595
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] gaussian family change suggestion

2006-04-11 Thread Simon Wood
> >>> Currently the `gaussian' family's initialization code signals an error if
> >>> any response data are zero or negative and a log link is used. Given that
> >>> zero or negative response data are perfectly legitimate under the GLM
> >>> fitted using `gaussian("log")', this seems a bit unsatisfactory. Might
> >>> it be worth changing it?
> >>
> >> Well, that's not really true: it says it is unable to find starting
> >> values, and asks you to provide some.  I don't really see why your choices
> >> are satisfactory: what happens if all the data are negative for example?
> >
> > - then the suggested code traps it.
>
> Yes, but why (it can happen too)?

- isn't this (i.e. all response <=0 ) a case where the MLE's of the
linear predictor must tend to minus infinity? So there's probably  an
argument for handling it differently [although my proposed error message
is not the most informative, since modifying the starting values won't help].

best,
Simon

>
> >> We could try even harder, but code that is almost never used tends to get
> >> broken whilst no one is testing it.  So if you want to pursue it I think
> >> we need a comprehensive test suite.
> >
> > - well, I guess, but I wouldn't have thought that zeros in data modelled
> > using `gaussian("log")' is such a rare occurance is it? [Or did you mean
> > that `gaussian("log")' is almost never used, and should hence be kept
> > simple].
>
> I meant initialization code for this case would rarely get used, as I
> suppose those people who do this are used to providing starting values
> (or there are few or no such people).
>
> >
> > I suppose there are arguments both ways...
> >
> > best,
> > Simon
> >
> >>
> >>> The current offending code from `gaussian' is:
> >>>
> >>> initialize = expression({
> >>>n <- rep.int(1, nobs)
> >>>if (is.null(etastart) && is.null(start) && is.null(mustart) &&
> >>>((family$link == "inverse" && any(y == 0)) ||
> >>>  (family$link == "log" && any(y <= 0 stop(
> >>>  "cannot find valid starting values: please specify some")
> >>>mustart <- y
> >>>})
> >>>
> >>> A possible replacement would be ...
> >>>
> >>> initialize = expression({
> >>>n <- rep.int(1, nobs)
> >>>if (is.null(etastart) && is.null(start) && is.null(mustart) &&
> >>>((family$link == "inverse" && all(y == 0)) ||
> >>>  (family$link == "log" && all(y <= 0 stop(
> >>>  "cannot find valid starting values: please specify some")
> >>>mustart <- y
> >>>  if (family$link=="log") {
> >>>iy <- y<=0
> >>>if (sum(iy)) mustart[iy] <- min(y[!iy])*.5
> >>>  } else if (family$link=="inverse") {
> >>>iy <- y==0
> >>>if (sum(iy)) mustart[iy] <- min(abs(y[!iy]))*.5
> >>>  }
> >>>})
> >>>
> >>> best,
> >>> Simon
> >>>
> >>>
> >>>> - Simon Wood, Mathematical Sciences, University of Bath, Bath BA2 7AY
> >>>> - +44 (0)1225 386603 www.maths.bath.ac.uk/~sw283/
> >>>
> >>> __
> >>> R-devel@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>
> >>>
> >>
> >> --
> >> Brian D. Ripley,  [EMAIL PROTECTED]
> >> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> >> University of Oxford, Tel:  +44 1865 272861 (self)
> >> 1 South Parks Road, +44 1865 272866 (PA)
> >> Oxford OX1 3TG, UKFax:  +44 1865 272595
> >>
> >
> >
>
> --
> Brian D. Ripley,  [EMAIL PROTECTED]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel:  +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UKFax:  +44 1865 272595
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel