I'm following up on a question I posted 8/10/2010, but my newsreader has lost this thread.

Barrett & Ling, JASA, 1992, v.87(417), pp184-191 define general classes of influence measures for multivariate regression models, including analogs of Cook's D, Andrews & Pregibon COVRATIO, etc. As in univariate response models, these are based on leverage and residuals based on omitting one (or more) observations at a time and refitting, although, in the univariate case, the computations can be optimized, as they are in
stats::influence() and related methods.

I'm interested in exploring the multivariate extension in R. I tried the following, and was surprised to find that R returned a result rather than an error -- presumably because mlm objects are not trapped before they
get to lm.influence()

I've done a bit more testing, comparing what I get from R lm functions with some direct calculations in both R and SAS, and now conclude that there are bugs in lm.influence and the coef(update()) methods I tried before to get leave-one-out coefficients for multivariate response models fit with lm(). By bugs, I mean that results returned are silently wrong, or at best, misleading.

My test case: a subset of the Rohwer data anayzed by Barrett & Ling (& others)

> library(heplots)
> data(Rohwer, package="heplots")
> Rohwer1 <- Rohwer[Rohwer$SES=='Hi',]
> rohwer.mod <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, data=Rohwer1)
> (B <- coef(rohwer.mod))
                   SAT       PPVT      Raven
(Intercept) -28.467471 39.6970896 13.2438356
n             3.257132  0.0672825  0.0593474
s             2.996583  0.3699843  0.4924440
ns           -5.859063 -0.3743802 -0.1640219
na            5.666223  1.5230090  0.1189805
ss           -0.622653  0.4101567 -0.1211564

Attempt to get the leave-one-out coefficients, B(-i), for the first 2 cases from influence(): *wrong* -- influence() should
probably issue
stop('Not defined for mlm objects'), or the documentation should be made clearer for what happens in this
case.

> head(influence(rohwer.mod, do.coef=TRUE)$coefficients, 2)
         [,1]      [,2]      [,3]     [,4]       [,5]       [,6]
[1,] -5.56830 0.0510135 -0.486096  0.61811  0.0585384 -0.1339219
[2,]  2.30754 0.1092886  0.388349 -0.40717 -0.0600899  0.0477736

The correct result, for the first case is

> # direct calculation
> X <- cbind(1, as.matrix(Rohwer1[,6:10]))
> Y <- as.matrix(Rohwer1[,3:5])
> keep <- 2:n
> (B1 <- solve(t(X[keep,]) %*% X[keep,]) %*% t(X[keep,]) %*% Y[keep,])
          SAT       PPVT      Raven
   -22.899170 41.7757793 13.5057156
n    3.206119  0.0482388  0.0569482
s    3.482679  0.5514477  0.5153053
ns  -6.477172 -0.6051252 -0.1930919
na   5.607684  1.5011562  0.1162274
ss  -0.488731  0.4601508 -0.1148580

OK, ?influence tells me that the 'coefficients' returned are actually B-B(-i), and I can see that
it gives those for the first response variable (SAT)

> B-B1
                   SAT       PPVT       Raven
(Intercept) -5.5683009 -2.0786897 -0.26187994
n            0.0510135  0.0190437  0.00239919
s           -0.4860961 -0.1814634 -0.02286134
ns           0.6181096  0.2307451  0.02907000
na           0.0585384  0.0218528  0.00275309
ss          -0.1339219 -0.0499941 -0.00629841

For SAT, same as re-fitting a univariate model:

> rohwer.inf1 <- influence(lm(SAT ~ n + s + ns + na + ss, data=Rohwer1), do.coef=TRUE) > colnames(rohwer.inf1$coefficients) <- paste("SAT", colnames(rohwer.inf1$coefficients), sep=":")
> head(rohwer.inf1$coefficients, 2)
   SAT:(Intercept)     SAT:n     SAT:s   SAT:ns     SAT:na     SAT:ss
38        -5.56830 0.0510135 -0.486096  0.61811  0.0585384 -0.1339219
39         2.30754 0.1092886  0.388349 -0.40717 -0.0600899  0.0477736
>

But I'm looking for a method I can generalize. I also tried the following, using subset= in update() and lm(), giving a different wrong answer. (Am I using the subset= argument
correctly?)

> coef(update(rohwer.mod, subset=1:n !=1, data=Rohwer1))
                   SAT      PPVT      Raven
(Intercept) -28.428163 51.762326 11.0052852
n             0.912497 -0.316177  0.1294643
s             5.478358  0.280319  0.6639180
ns           -6.119679 -1.828516 -0.2995350
na            5.383479  2.096793  0.1940462
ss           -0.345764  0.448199 -0.0725977
Warning message:
In 1:n : numerical expression has 32 elements: only the first used

So, how can I calculate leave-one-out coefficients (and other quantities) efficiently
for mlm-s?

-Michael



--
Michael Friendly     Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street    Web:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to