>>>>> Fox, John <j...@mcmaster.ca> >>>>> on Thu, 14 Sep 2017 13:46:44 +0000 writes:
> Dear Martin, I made three points which likely got lost > because of the way I presented them: > (1) Singularity is an unusual situation and should be made > more prominent. It typically reflects a problem with the > data or the specification of the model. That's not to say > that it *never* makes sense to allow singular fits (as in > the situations you mentions). > I'd favour setting singular.ok=FALSE as the default, but > in the absence of that a warning or at least a note. A > compromise would be to have a singular.ok option() that > would be FALSE out of the box. > Any changes would have to be made very carefully so as not > to create chaos. I for one, am too reluctant to want to change the default there. > That goes for the points below as well. > (2) coef() and vcov() behave inconsistently, which can be > problematic because one often uses them together in code. indeed; and I had agreed on that. As of today, in R-devel only they now behave compatibly. NEWS entry • The “default” ("lm" etc) methods of vcov() have gained new optional argument complete = TRUE which makes the vcov() methods more consistent with the coef() methods in the case of singular designs. The former behavior is now achieved by vcov(*, complete=FALSE). > (3) As you noticed in your second message, lm() has a > singular.ok argument and glm() doesn't. and that has been amended even earlier (a bit more than a month ago) in R-devel svn rev 73380 with NEWS entry • glm() and glm.fit get the same singular.ok=TRUE argument that lm() has had forever. As a consequence, in glm(*, method = <your_own>), user specified methods need to accept a singular.ok argument as well. > I'll take a look at the code for glm() with an eye towards > creating a patch, but I'm a bit reluctant to mess with the > code for something as important as glm(). and as a matter of fact you did send me +- the R code part of that change. My current plan is to also add the 'complete = TRUE' option to the "basic" coef() methods, such that you also have consistent coef(*, complete=FALSE) and vcov(*, complete=FALSE) behaviors. Thank you and Terry (and others?) for bringing up the issues and discussing them thoroughly! Best, Martin. > Best, John >> -----Original Message----- From: Martin Maechler >> [mailto:maech...@stat.math.ethz.ch] Sent: Thursday, >> September 14, 2017 4:23 AM To: Martin Maechler >> <maech...@stat.math.ethz.ch> Cc: Fox, John >> <j...@mcmaster.ca>; Therneau, Terry M., Ph.D. >> <thern...@mayo.edu>; r-devel@r-project.org Subject: Re: >> [Rd] vcov and survival >> >> >>>>> Martin Maechler <maech...@stat.math.ethz.ch> >>>>> >> on Thu, 14 Sep 2017 10:13:02 +0200 writes: >> >> >>>>> Fox, John <j...@mcmaster.ca> >>>>> on Wed, 13 Sep >> 2017 22:45:07 +0000 writes: >> >> >> Dear Terry, >> Even the behaviour of lm() and glm() >> isn't entirely consistent. In both cases, singularity >> results in NA coefficients by default, and these are >> reported in the model summary and coefficient vector, but >> not in the coefficient covariance matrix: >> >> >> ---------------- >> >> >>> mod.lm <- lm(Employed ~ GNP + Population + I(GNP + >> Population), >> + data=longley) >>> summary(mod.lm) >> >> >> Call: >> lm(formula = Employed ~ GNP + Population + >> I(GNP + Population), >> data = longley) >> >> >> Residuals: >> Min 1Q Median 3Q Max >> -0.80899 >> -0.33282 -0.02329 0.25895 1.08800 >> >> >> Coefficients: (1 not defined because of singularities) >> >> Estimate Std. Error t value Pr(>|t|) >> (Intercept) >> 88.93880 13.78503 6.452 2.16e-05 *** >> GNP 0.06317 >> 0.01065 5.933 4.96e-05 *** >> Population -0.40974 0.15214 >> -2.693 0.0184 * >> I(GNP + Population) NA NA NA NA >> >> --- >> >> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' >> 0.1 ' ' 1 >> >> >> Residual standard error: 0.5459 on 13 degrees of >> freedom >> Multiple R-squared: 0.9791, Adjusted >> R-squared: 0.9758 >> F-statistic: 303.9 on 2 and 13 DF, >> p-value: 1.221e-11 >> >> >>> vcov(mod.lm) >> (Intercept) GNP Population >> >> (Intercept) 190.0269691 0.1445617813 -2.0954381 >> GNP >> 0.1445618 0.0001133631 -0.0016054 >> Population >> -2.0954381 -0.0016053999 0.0231456 >>> coef(mod.lm) >> >> (Intercept) GNP Population I(GNP + Population) >> >> 88.93879831 0.06317244 -0.40974292 NA >> >>> >> >>> mod.glm <- glm(Employed ~ GNP + Population + I(GNP + >> Population), >> + data=longley) >>> summary(mod.glm) >> >> >> Call: >> glm(formula = Employed ~ GNP + Population + >> I(GNP + Population), >> data = longley) >> >> >> Deviance Residuals: >> Min 1Q Median 3Q Max >> >> -0.80899 -0.33282 -0.02329 0.25895 1.08800 >> >> >> Coefficients: (1 not defined because of singularities) >> >> Estimate Std. Error t value Pr(>|t|) >> (Intercept) >> 88.93880 13.78503 6.452 2.16e-05 *** >> GNP 0.06317 >> 0.01065 5.933 4.96e-05 *** >> Population -0.40974 0.15214 >> -2.693 0.0184 * >> I(GNP + Population) NA NA NA NA >> >> --- >> >> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' >> 0.1 ' ' 1 >> >> >> (Dispersion parameter for gaussian family taken to be >> 0.2980278) >> >> >> Null deviance: 185.0088 on 15 degrees of freedom >> >> Residual deviance: 3.8744 on 13 degrees of freedom >> >> AIC: 30.715 >> >> >> Number of Fisher Scoring iterations: 2 >> >> >>> coef(mod.glm) >> (Intercept) GNP Population I(GNP + >> Population) >> 88.93879831 0.06317244 -0.40974292 NA >>> >> vcov(mod.glm) >> (Intercept) GNP Population >> >> (Intercept) 190.0269691 0.1445617813 -2.0954381 >> GNP >> 0.1445618 0.0001133631 -0.0016054 >> Population >> -2.0954381 -0.0016053999 0.0231456 >> >> >> ---------------- >> >> >> Moreoever, lm() has a singular.ok() argument that >> defaults to TRUE, but glm() doesn't have this argument: >> >> >> ---------------- >> >> >>> mod.lm <- lm(Employed ~ GNP + Population + I(GNP + >> Population), >> + data=longley, singular.ok=FALSE) >> >> Error in lm.fit(x, y, offset = offset, singular.ok = >> singular.ok, ...) : >> singular fit encountered >> >> >> ---------------- >> >> >> In my opinion, singularity should at least produce a >> warning, both in calls to lm() and glm(), and in >> summary() output. Even better, again in my opinion, would >> be to produce an error by default in this situation, but >> doing so would likely break too much existing code. >> >> > Yes, I would not want to change. Note that this is >> from S > already, i.e., long "ingrained". I think there >> one argument was > that there are situations with factor >> predictors of many levels > and conceptually their 2- or >> even 3-way interactions (!) > where it is neat to just >> fit the model, (-> get residuals and > fitted values) and >> also see implicitly the "necessary rank" of > prediction >> space, or rather even more specifically, you see for > >> every factor how many levels are "distinguishable"/useful >> for > prediction, given the data. >> >> >> I prefer NA to 0 for the redundant coefficients >> because it at least suggests that the decision about what >> to exclude is arbitrary, and of course simply excluding >> coefficients isn't the only way to proceed. >> >> > I'm less modest and would say *definitely*, NA's are >> highly > prefered in such a situation. >> >> >> Finally, the differences in behaviour between coef() >> and vcov() and between lm() and glm() aren't really >> sensible. >> >> > I really haven't seen any difference between lm() and >> glm() in > the example above. Maybe you can point them >> out for me. >> >> .. now I saw it: lm() has a 'singular.ok = TRUE' argument >> which you can set to FALSE if you prefer an error to NA >> coefficients. >> >> I also agree with you John that it would be nice if glm() >> also got such an argument. Patches are welcome and seem >> easy. Nowadays we prefer them as attachments (diff/patch >> file!) at R's https://bugs.r-project.org bugzilla against >> the svn source, here >> https://svn.r-project.org/R/trunk/src/library/stats/R/glm.R >> and >> https://svn.r-project.org/R/trunk/src/library/stats/man/glm.Rd >> >> > I do quite agree that vcov() should be compatible with >> > coef() [and summary()] for both 'lm' and 'glm' methods, >> i.e., > should get NA rows and columns there. This would >> require > eliminating these before e.g. using it in >> solve(<vcov>, *) etc, > but I think it would be a good >> idea that the useR must deal with > these NAs actively. >> >> > Shall "we" try and see the fallout in CRAN space? >> >> >> Maybe there's some reason for all this that escapes >> me. > (for the first one---"no error"--- I gave a >> reason) >> >> >> Best, >> John >> >> >> -------------------------------------- >> >> John Fox, Professor Emeritus >> McMaster University >> >> Hamilton, Ontario, Canada >> Web: >> socserv.mcmaster.ca/jfox >> >> >> >> >> >>> -----Original Message----- >>> From: R-devel >> [mailto:r-devel-boun...@r-project.org] On Behalf Of >>> >> Therneau, Terry M., Ph.D. >>> Sent: Wednesday, September >> 13, 2017 6:19 PM >>> To: r-devel@r-project.org >>> >> Subject: [Rd] vcov and survival >> >>> >> >>> I have just noticed a difference in behavior between >> coxph and lm/glm: >>> if one or more of the coefficients >> from the fit in NA, then lm and glm >>> omit that >> row/column from the variance matrix; while coxph retains >> it >>> but sets the values to zero. >> >>> >> >>> Is this something that should be "fixed", i.e., made >> to agree? I >>> suspect that doing so will break other >> packages, but then NA coefs are >>> rather rare so >> perhaps not. >> >>> >> >>> Terry Therneau ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel