Dear Gavin and Paul, (k + 1)/n is the average hatvalue. The 2(k + 1)/n rule comes from results in Belsley, Kuh, and Welsch (1980), Regression Diagnostics, concerning the distribution of the hatvalues when n is large relative to k + 1, and when X is multivariate normal. For smaller n, this tends to nominate too many points, and thus suggests the rule 3(k + 1)/n, which I think is also due to Belsley et al.
I'd prefer to call such hatvalues "noteworthy" rather than "influential," since hatvalues measure "leverage" on the least-squares fit and not influence (on the coefficients). Finally, I think that it's a better idea to examine diagnostics like hatvalues graphically rather than paying too much attention to numerical cutoffs. Regards, John -------------------------------- John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > project.org] On Behalf Of Gavin Simpson > Sent: March-09-08 5:54 AM > To: Paul Lynch > Cc: r-help@r-project.org > Subject: Re: [R] Formula for whether hat value is influential? > > On Sat, 2008-03-08 at 19:38 -0800, Paul Lynch wrote: > > I was wondering if someone might be able to tell me what formula R's > > influence.measures function uses for determining whether the hat > value > > it computes is influential (i.e., the true/false value in the "hat" > > column of the returned is.inf data frame). The reason I'm asking is > > that its results disagree with what I've just learned in my > statistics > > class, namely that a point should be considered influential if h_ii > > > 2(k+1)/n, where k+1 is the number of parameters in the model and n is > > the number of data points. My 2(k+1)/n value would mark at least one > > more point influential than influence.measures does for the data set > > I'm looking at. > > This is R, which because it is open source, you have access to all the > source code - type influence.measures (without () )at the prompt to see > a version without any comments. > > In the in-line function is.influential(), you'll find the critical > levels used. The hat values are in infmat[, k + 4], which is the last > column (where k is the number of terms in the model, inc. the intercept > if present). The relevant part of is.influential is: > > infmat[, k + 4] > (3 * k)/n > > So R is using (3*(k+1)) / n in your notation (in the R code k is the > number of terms in the model, *including* the intercept if present in > the model). > > The function was originally in John Fox's car package that is support > software for his book Companion to Applied Regression. In that book, > IIRC, Fox uses two cut-offs for hat values or 2 or 3 times the average > hat value as indicating influential observations. R is using the upper > level here. I would check out some of the references cited in the > References section of ?influence.measures to see why this has been > chosen. > > HTH > > G > > > > > I am using R 2.4.1 under Windows. (Upgrading is difficult due to > > rather severe security policies.) > > > > Thanks, > > > > --Paul > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > > and provide commented, minimal, self-contained, reproducible code. > -- > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > Dr. Gavin Simpson [t] +44 (0)20 7679 0522 > ECRC, UCL Geography, [f] +44 (0)20 7679 0565 > Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk > Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ > UK. WC1E 6BT. [w] http://www.freshwaters.org.uk > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.