Re: [R] lm() and dffits

Dieter Menne Sun, 31 Aug 2008 13:09:49 -0700

Ranney, Steven <steven.ranney <at> montana.edu> writes:

> 1) fit a simple lm(LW~LL)
> 2) calculate the dffits for those data points
> 3) remove those data points that are 2*sqrt(p/n) (where p=the number of 
> parameters and n=number of data points; p=3 in a linear model, correct?  
> Intercept, slope, and error term?)
> 4) rerun the model MINUS those data points
> 5) compare the two lm()
> 
> Now, each of these steps I can do seperately, but only by outputting the 
> dffits to a .csv then removing the large dffits by hand, reading the .csv 
> back into R, rerunning the lm(), and comparing the first lm() to the second 
> lm().  I would imagine that there is a better (easier, I hope!) way to doing 
> all of this.  Any ideas?  
>


You could do the following:

# --------------------
x = rnorm(100)
y=rnorm(100)
y[40] = y[40]+30 # generate outliere
df = data.frame(x=x,y=y)
lmfit1 = lm(y~x, data=df) # fit all data
thresh = 3 # Choose any data-dependent threshold
nice = abs(dffits(lmfit)) < thresh
# note that nice[40] is the only  FALSE
df2 = df[nice,]
lmfit2 = lm(y~x, data=df2)

summary(lmfit1)
summary(lmfit2)
# --------------------

However, this is a bit Denver-Style Home-Brewery. Instead of using this 
ad-hoc method, you are probably better off using one of the robust methods, for
example in MASS.

Dieter

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lm() and dffits

Reply via email to