[Rd] Some "lm" methods give wrong results when applied to "mlm" objects

Jari Oksanen Tue, 04 Apr 2017 01:25:05 -0700

I had a look at some influence measures, and it seems to me that currently 
several methods handle multiple lm (mlm) objects wrongly in R. In some cases 
there are separate "mlm" methods, but usually "mlm" objects are handled by the 
same methods as univariate "lm" methods, and in some cases this fails.


There are two general patterns of problems in influence measures:

1) The univariate methods assume that overall standard deviation (sd) is of 
length one, but for "mlm" models we have a multivariate response with a 
multicolumn residual matrix. The functions also get correctly the sd vector 
corresponding to the columns, but it is not applied to these, but recycled for 
rows. This influences rstandard.lm and cooks.distance.lm. For instance, in 
cooks.distance.lm we have ((res/(sd * (1 - hat)))^2 * hat)/p, where res is a n 
x m matrix, sd is a m-vector and hat is a n-vector).  Both of these functions 
are very easily fixed.

2) Another problem is that several functions are based on lm.influence 
function, and it seems that it returns elements sigma and coefficients that are 
only based on the first variable (first column of the residual matrix wt.res) 
and give wrong results for other variables. This will influence functions 
dfbeta.lm (coefficients), dfbetas.lm (coefficients, sigma), dffits (sigma), 
rstudent.lm (sigma) and covratio (sigma). lm.influence finds these elements in 
compiled code and this is harder to fix. MASS (the book & the package) avoid 
using compiled code in their (univariate) studentized residuals, and instead 
use a clever short-cut.

In addition to these, there are a couple of other cases which seem to fail with 
"mlm" models: 

confint.lm gives empty result, because the length of the results is defined by 
names(coef(object)) which is NULL because "mlm" objects return a matrix of 
coefficients instead of a vector with names.

dummy.coef fails because "mlm" objects do not have xlevels item.

extractAIC.lm returns only one value instead of a vector, and edf is 
misleading. Separate deviance.mlm returns a vector of deviances, and logLik.lm 
returns "'logLik.lm' does not support multiple responses". Probably 
extractAIC.lm should work like logLik.lm.

Several methods already handle "mlm" methods by returning message "xxxx  is not 
yet implemented for multivariate lm()" which of course is a natural and correct 
solution to the problems.

Cheers, Jari Oksanen
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Some "lm" methods give wrong results when applied to "mlm" objects

Reply via email to