Re: [R] influence measures for multivariate linear models

Michael Friendly Tue, 14 Sep 2010 15:12:15 -0700

I'm following up on a question I posted 8/10/2010, but my newsreaderhas lost this thread.

Barrett & Ling, JASA, 1992, v.87(417), pp184-191 define generalclasses of influence measures for multivariateregression models, including analogs of Cook's D, Andrews & PregibonCOVRATIO, etc. As in univariateresponse models, these are based on leverage and residuals based onomitting one (or more) observations ata time and refitting, although, in the univariate case, thecomputations can be optimized, as they are in
stats::influence() and related methods.
I'm interested in exploring the multivariate extension in R. I triedthe following, and was surprised to find thatR returned a result rather than an error -- presumably because mlmobjects are not trapped before they
get to lm.influence()

I've done a bit more testing, comparing what I get from R lm functionswith some direct calculations in bothR and SAS, and now conclude that there are bugs in lm.influence and thecoef(update()) methods I triedbefore to get leave-one-out coefficients for multivariate responsemodels fit with lm().By bugs, I mean that results returned are silently wrong, or at best,misleading.

My test case: a subset of the Rohwer data anayzed by Barrett & Ling (&others)


> library(heplots)
> data(Rohwer, package="heplots")
> Rohwer1 <- Rohwer[Rohwer$SES=='Hi',]

> rohwer.mod <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss,data=Rohwer1)

> (B <- coef(rohwer.mod))
                   SAT       PPVT      Raven
(Intercept) -28.467471 39.6970896 13.2438356
n             3.257132  0.0672825  0.0593474
s             2.996583  0.3699843  0.4924440
ns           -5.859063 -0.3743802 -0.1640219
na            5.666223  1.5230090  0.1189805
ss           -0.622653  0.4101567 -0.1211564

Attempt to get the leave-one-out coefficients, B(-i), for the first 2cases from influence(): *wrong* -- influence() should

probably issue

stop('Not defined for mlm objects'), or the documentation should be madeclearer for what happens in this

case.

> head(influence(rohwer.mod, do.coef=TRUE)$coefficients, 2)
         [,1]      [,2]      [,3]     [,4]       [,5]       [,6]
[1,] -5.56830 0.0510135 -0.486096  0.61811  0.0585384 -0.1339219
[2,]  2.30754 0.1092886  0.388349 -0.40717 -0.0600899  0.0477736

The correct result, for the first case is

> # direct calculation
> X <- cbind(1, as.matrix(Rohwer1[,6:10]))
> Y <- as.matrix(Rohwer1[,3:5])
> keep <- 2:n
> (B1 <- solve(t(X[keep,]) %*% X[keep,]) %*% t(X[keep,]) %*% Y[keep,])
          SAT       PPVT      Raven
   -22.899170 41.7757793 13.5057156
n    3.206119  0.0482388  0.0569482
s    3.482679  0.5514477  0.5153053
ns  -6.477172 -0.6051252 -0.1930919
na   5.607684  1.5011562  0.1162274
ss  -0.488731  0.4601508 -0.1148580

OK, ?influence tells me that the 'coefficients' returned are actuallyB-B(-i), and I can see that

it gives those for the first response variable (SAT)

> B-B1
                   SAT       PPVT       Raven
(Intercept) -5.5683009 -2.0786897 -0.26187994
n            0.0510135  0.0190437  0.00239919
s           -0.4860961 -0.1814634 -0.02286134
ns           0.6181096  0.2307451  0.02907000
na           0.0585384  0.0218528  0.00275309
ss          -0.1339219 -0.0499941 -0.00629841

For SAT, same as re-fitting a univariate model:

> rohwer.inf1 <- influence(lm(SAT ~ n + s + ns + na + ss,data=Rohwer1), do.coef=TRUE)> colnames(rohwer.inf1$coefficients) <- paste("SAT",colnames(rohwer.inf1$coefficients), sep=":")

> head(rohwer.inf1$coefficients, 2)
   SAT:(Intercept)     SAT:n     SAT:s   SAT:ns     SAT:na     SAT:ss
38        -5.56830 0.0510135 -0.486096  0.61811  0.0585384 -0.1339219
39         2.30754 0.1092886  0.388349 -0.40717 -0.0600899  0.0477736
>

But I'm looking for a method I can generalize. I also tried thefollowing, using subset= inupdate() and lm(), giving a different wrong answer. (Am I using thesubset= argument

correctly?)

> coef(update(rohwer.mod, subset=1:n !=1, data=Rohwer1))
                   SAT      PPVT      Raven
(Intercept) -28.428163 51.762326 11.0052852
n             0.912497 -0.316177  0.1294643
s             5.478358  0.280319  0.6639180
ns           -6.119679 -1.828516 -0.2995350
na            5.383479  2.096793  0.1940462
ss           -0.345764  0.448199 -0.0725977
Warning message:
In 1:n : numerical expression has 32 elements: only the first used

So, how can I calculate leave-one-out coefficients (and otherquantities) efficiently

for mlm-s?

-Michael



--
Michael Friendly     Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street    Web:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] influence measures for multivariate linear models

Reply via email to