Dear Martin, Thank you for taking care of this.
Best, John > -----Original Message----- > From: Martin Maechler [mailto:maech...@stat.math.ethz.ch] > Sent: Thursday, November 2, 2017 4:59 PM > To: Fox, John <j...@mcmaster.ca> > Cc: Martin Maechler <maech...@stat.math.ethz.ch>; Therneau, Terry M., > Ph.D. <thern...@mayo.edu>; r-devel@r-project.org > Subject: RE: [Rd] vcov and survival > > >>>>> Fox, John <j...@mcmaster.ca> > >>>>> on Thu, 14 Sep 2017 13:46:44 +0000 writes: > > > Dear Martin, I made three points which likely got lost > > because of the way I presented them: > > > (1) Singularity is an unusual situation and should be made > > more prominent. It typically reflects a problem with the > > data or the specification of the model. That's not to say > > that it *never* makes sense to allow singular fits (as in > > the situations you mentions). > > > I'd favour setting singular.ok=FALSE as the default, but > > in the absence of that a warning or at least a note. A > > compromise would be to have a singular.ok option() that > > would be FALSE out of the box. > > > Any changes would have to be made very carefully so as not > > to create chaos. > > I for one, am too reluctant to want to change the default there. > > > That goes for the points below as well. > > > (2) coef() and vcov() behave inconsistently, which can be > > problematic because one often uses them together in code. > > indeed; and I had agreed on that. > As of today, in R-devel only they now behave compatibly. > NEWS entry > > • The “default” ("lm" etc) methods of vcov() have gained new > optional argument complete = TRUE which makes the vcov() methods > more consistent with the coef() methods in the case of singular > designs. The former behavior is now achieved by vcov(*, > complete=FALSE). > > > > (3) As you noticed in your second message, lm() has a > > singular.ok argument and glm() doesn't. > > and that has been amended even earlier (a bit more than a month > ago) in R-devel svn rev 73380 with NEWS entry > > • glm() and glm.fit get the same singular.ok=TRUE argument that > lm() has had forever. As a consequence, in glm(*, method = > <your_own>), user specified methods need to accept a singular.ok > argument as well. > > > I'll take a look at the code for glm() with an eye towards > > creating a patch, but I'm a bit reluctant to mess with the > > code for something as important as glm(). > > and as a matter of fact you did send me +- the R code part of that change. > > My current plan is to also add the 'complete = TRUE' option to the "basic" > coef() methods, such that you also have consistent coef(*, complete=FALSE) > and vcov(*, complete=FALSE) behaviors. > > Thank you and Terry (and others?) for bringing up the issues and discussing > them thoroughly! > > Best, > Martin. > > > > Best, John > > > > >> -----Original Message----- From: Martin Maechler > >> [mailto:maech...@stat.math.ethz.ch] Sent: Thursday, > >> September 14, 2017 4:23 AM To: Martin Maechler > >> <maech...@stat.math.ethz.ch> Cc: Fox, John > >> <j...@mcmaster.ca>; Therneau, Terry M., Ph.D. > >> <thern...@mayo.edu>; r-devel@r-project.org Subject: Re: > >> [Rd] vcov and survival > >> > >> >>>>> Martin Maechler <maech...@stat.math.ethz.ch> >>>>> > >> on Thu, 14 Sep 2017 10:13:02 +0200 writes: > >> > >> >>>>> Fox, John <j...@mcmaster.ca> >>>>> on Wed, 13 Sep > >> 2017 22:45:07 +0000 writes: > >> > >> >> Dear Terry, >> Even the behaviour of lm() and glm() > >> isn't entirely consistent. In both cases, singularity > >> results in NA coefficients by default, and these are > >> reported in the model summary and coefficient vector, but > >> not in the coefficient covariance matrix: > >> > >> >> ---------------- > >> > >> >>> mod.lm <- lm(Employed ~ GNP + Population + I(GNP + > >> Population), >> + data=longley) >>> summary(mod.lm) > >> > >> >> Call: >> lm(formula = Employed ~ GNP + Population + > >> I(GNP + Population), >> data = longley) > >> > >> >> Residuals: >> Min 1Q Median 3Q Max >> -0.80899 > >> -0.33282 -0.02329 0.25895 1.08800 > >> > >> >> Coefficients: (1 not defined because of singularities) > >> >> Estimate Std. Error t value Pr(>|t|) >> (Intercept) > >> 88.93880 13.78503 6.452 2.16e-05 *** >> GNP 0.06317 > >> 0.01065 5.933 4.96e-05 *** >> Population -0.40974 0.15214 > >> -2.693 0.0184 * >> I(GNP + Population) NA NA NA NA > >> >> --- > >> >> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' > >> 0.1 ' ' 1 > >> > >> >> Residual standard error: 0.5459 on 13 degrees of > >> freedom >> Multiple R-squared: 0.9791, Adjusted > >> R-squared: 0.9758 >> F-statistic: 303.9 on 2 and 13 DF, > >> p-value: 1.221e-11 > >> > >> >>> vcov(mod.lm) >> (Intercept) GNP Population >> > >> (Intercept) 190.0269691 0.1445617813 -2.0954381 >> GNP > >> 0.1445618 0.0001133631 -0.0016054 >> Population > >> -2.0954381 -0.0016053999 0.0231456 >>> coef(mod.lm) >> > >> (Intercept) GNP Population I(GNP + Population) >> > >> 88.93879831 0.06317244 -0.40974292 NA > >> >>> > >> >>> mod.glm <- glm(Employed ~ GNP + Population + I(GNP + > >> Population), >> + data=longley) >>> summary(mod.glm) > >> > >> >> Call: >> glm(formula = Employed ~ GNP + Population + > >> I(GNP + Population), >> data = longley) > >> > >> >> Deviance Residuals: >> Min 1Q Median 3Q Max >> > >> -0.80899 -0.33282 -0.02329 0.25895 1.08800 > >> > >> >> Coefficients: (1 not defined because of singularities) > >> >> Estimate Std. Error t value Pr(>|t|) >> (Intercept) > >> 88.93880 13.78503 6.452 2.16e-05 *** >> GNP 0.06317 > >> 0.01065 5.933 4.96e-05 *** >> Population -0.40974 0.15214 > >> -2.693 0.0184 * >> I(GNP + Population) NA NA NA NA > >> >> --- > >> >> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' > >> 0.1 ' ' 1 > >> > >> >> (Dispersion parameter for gaussian family taken to be > >> 0.2980278) > >> > >> >> Null deviance: 185.0088 on 15 degrees of freedom >> > >> Residual deviance: 3.8744 on 13 degrees of freedom >> > >> AIC: 30.715 > >> > >> >> Number of Fisher Scoring iterations: 2 > >> > >> >>> coef(mod.glm) >> (Intercept) GNP Population I(GNP + > >> Population) >> 88.93879831 0.06317244 -0.40974292 NA >>> > >> vcov(mod.glm) >> (Intercept) GNP Population >> > >> (Intercept) 190.0269691 0.1445617813 -2.0954381 >> GNP > >> 0.1445618 0.0001133631 -0.0016054 >> Population > >> -2.0954381 -0.0016053999 0.0231456 > >> > >> >> ---------------- > >> > >> >> Moreoever, lm() has a singular.ok() argument that > >> defaults to TRUE, but glm() doesn't have this argument: > >> > >> >> ---------------- > >> > >> >>> mod.lm <- lm(Employed ~ GNP + Population + I(GNP + > >> Population), >> + data=longley, singular.ok=FALSE) >> > >> Error in lm.fit(x, y, offset = offset, singular.ok = > >> singular.ok, ...) : >> singular fit encountered > >> > >> >> ---------------- > >> > >> >> In my opinion, singularity should at least produce a > >> warning, both in calls to lm() and glm(), and in > >> summary() output. Even better, again in my opinion, would > >> be to produce an error by default in this situation, but > >> doing so would likely break too much existing code. > >> > >> > Yes, I would not want to change. Note that this is > >> from S > already, i.e., long "ingrained". I think there > >> one argument was > that there are situations with factor > >> predictors of many levels > and conceptually their 2- or > >> even 3-way interactions (!) > where it is neat to just > >> fit the model, (-> get residuals and > fitted values) and > >> also see implicitly the "necessary rank" of > prediction > >> space, or rather even more specifically, you see for > > >> every factor how many levels are "distinguishable"/useful > >> for > prediction, given the data. > >> > >> >> I prefer NA to 0 for the redundant coefficients > >> because it at least suggests that the decision about what > >> to exclude is arbitrary, and of course simply excluding > >> coefficients isn't the only way to proceed. > >> > >> > I'm less modest and would say *definitely*, NA's are > >> highly > prefered in such a situation. > >> > >> >> Finally, the differences in behaviour between coef() > >> and vcov() and between lm() and glm() aren't really > >> sensible. > >> > >> > I really haven't seen any difference between lm() and > >> glm() in > the example above. Maybe you can point them > >> out for me. > >> > >> .. now I saw it: lm() has a 'singular.ok = TRUE' argument > >> which you can set to FALSE if you prefer an error to NA > >> coefficients. > >> > >> I also agree with you John that it would be nice if glm() > >> also got such an argument. Patches are welcome and seem > >> easy. Nowadays we prefer them as attachments (diff/patch > >> file!) at R's https://bugs.r-project.org bugzilla against > >> the svn source, here > >> https://svn.r-project.org/R/trunk/src/library/stats/R/glm.R > >> and > >> https://svn.r-project.org/R/trunk/src/library/stats/man/glm.Rd > >> > >> > I do quite agree that vcov() should be compatible with > >> > coef() [and summary()] for both 'lm' and 'glm' methods, > >> i.e., > should get NA rows and columns there. This would > >> require > eliminating these before e.g. using it in > >> solve(<vcov>, *) etc, > but I think it would be a good > >> idea that the useR must deal with > these NAs actively. > >> > >> > Shall "we" try and see the fallout in CRAN space? > >> > >> >> Maybe there's some reason for all this that escapes > >> me. > (for the first one---"no error"--- I gave a > >> reason) > >> > >> >> Best, >> John > >> > >> >> -------------------------------------- > >> >> John Fox, Professor Emeritus >> McMaster University >> > >> Hamilton, Ontario, Canada >> Web: > >> socserv.mcmaster.ca/jfox > >> > >> > >> > >> > >> >>> -----Original Message----- >>> From: R-devel > >> [mailto:r-devel-boun...@r-project.org] On Behalf Of >>> > >> Therneau, Terry M., Ph.D. >>> Sent: Wednesday, September > >> 13, 2017 6:19 PM >>> To: r-devel@r-project.org >>> > >> Subject: [Rd] vcov and survival > >> >>> > >> >>> I have just noticed a difference in behavior between > >> coxph and lm/glm: >>> if one or more of the coefficients > >> from the fit in NA, then lm and glm >>> omit that > >> row/column from the variance matrix; while coxph retains > >> it >>> but sets the values to zero. > >> >>> > >> >>> Is this something that should be "fixed", i.e., made > >> to agree? I >>> suspect that doing so will break other > >> packages, but then NA coefs are >>> rather rare so > >> perhaps not. > >> >>> > >> >>> Terry Therneau ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel