Re: [Rd] vcov and survival

Fox, John Wed, 13 Sep 2017 15:46:09 -0700

Dear Terry,

Even the behaviour of lm() and glm() isn't entirely consistent. In both cases, 
singularity results in NA coefficients by default, and these are reported in 
the model summary and coefficient vector, but not in the coefficient covariance 
matrix:


----------------        

> mod.lm <- lm(Employed ~ GNP + Population + I(GNP + Population), 
+              data=longley)
> summary(mod.lm)

Call:
lm(formula = Employed ~ GNP + Population + I(GNP + Population), 
    data = longley)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.80899 -0.33282 -0.02329  0.25895  1.08800 

Coefficients: (1 not defined because of singularities)
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)         88.93880   13.78503   6.452 2.16e-05 ***
GNP                  0.06317    0.01065   5.933 4.96e-05 ***
Population          -0.40974    0.15214  -2.693   0.0184 *  
I(GNP + Population)       NA         NA      NA       NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5459 on 13 degrees of freedom
Multiple R-squared:  0.9791,    Adjusted R-squared:  0.9758 
F-statistic: 303.9 on 2 and 13 DF,  p-value: 1.221e-11

> vcov(mod.lm)
            (Intercept)           GNP Population
(Intercept) 190.0269691  0.1445617813 -2.0954381
GNP           0.1445618  0.0001133631 -0.0016054
Population   -2.0954381 -0.0016053999  0.0231456
> coef(mod.lm)
        (Intercept)                 GNP          Population I(GNP + Population) 
        88.93879831          0.06317244         -0.40974292                  NA 
> 
> mod.glm <- glm(Employed ~ GNP + Population + I(GNP + Population), 
+               data=longley)
> summary(mod.glm)

Call:
glm(formula = Employed ~ GNP + Population + I(GNP + Population), 
    data = longley)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-0.80899  -0.33282  -0.02329   0.25895   1.08800  

Coefficients: (1 not defined because of singularities)
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)         88.93880   13.78503   6.452 2.16e-05 ***
GNP                  0.06317    0.01065   5.933 4.96e-05 ***
Population          -0.40974    0.15214  -2.693   0.0184 *  
I(GNP + Population)       NA         NA      NA       NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 0.2980278)

    Null deviance: 185.0088  on 15  degrees of freedom
Residual deviance:   3.8744  on 13  degrees of freedom
AIC: 30.715

Number of Fisher Scoring iterations: 2

> coef(mod.glm)
        (Intercept)                 GNP          Population I(GNP + Population) 
        88.93879831          0.06317244         -0.40974292                  NA 
> vcov(mod.glm)
            (Intercept)           GNP Population
(Intercept) 190.0269691  0.1445617813 -2.0954381
GNP           0.1445618  0.0001133631 -0.0016054
Population   -2.0954381 -0.0016053999  0.0231456

----------------        

Moreoever, lm() has a singular.ok() argument that defaults to TRUE, but glm() 
doesn't have this argument:

----------------        

> mod.lm <- lm(Employed ~ GNP + Population + I(GNP + Population), 
+              data=longley, singular.ok=FALSE)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  singular fit encountered

----------------        

In my opinion, singularity should at least produce a warning, both in calls to 
lm() and glm(), and in summary() output. Even better, again in my opinion, 
would be to produce an error by default in this situation, but doing so would 
likely break too much existing code. 

I prefer NA to 0 for the redundant coefficients because it at least suggests 
that the decision about what to exclude is arbitrary, and of course simply 
excluding coefficients isn't the only way to proceed. 

Finally, the differences in behaviour between coef() and vcov() and between 
lm() and glm() aren't really sensible.

Maybe there's some reason for all this that escapes me.

Best,
 John

--------------------------------------
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
Web: socserv.mcmaster.ca/jfox




> -----Original Message-----
> From: R-devel [mailto:[email protected]] On Behalf Of
> Therneau, Terry M., Ph.D.
> Sent: Wednesday, September 13, 2017 6:19 PM
> To: [email protected]
> Subject: [Rd] vcov and survival
> 
> I have just noticed a difference in behavior between coxph and lm/glm:
> if one or more of the coefficients from the fit in NA, then lm and glm
> omit that row/column from the variance matrix; while coxph retains it
> but sets the values to zero.
> 
>    Is this something that should be "fixed", i.e., made to agree? I
> suspect that doing so will break other packages, but then NA coefs are
> rather rare so perhaps not.
> 
> Terry Therneau
> 
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] vcov and survival

Reply via email to