Ranney, Steven <steven.ranney <at> montana.edu> writes: > 1) fit a simple lm(LW~LL) > 2) calculate the dffits for those data points > 3) remove those data points that are 2*sqrt(p/n) (where p=the number of > parameters and n=number of data points; p=3 in a linear model, correct? > Intercept, slope, and error term?) > 4) rerun the model MINUS those data points > 5) compare the two lm() > > Now, each of these steps I can do seperately, but only by outputting the > dffits to a .csv then removing the large dffits by hand, reading the .csv > back into R, rerunning the lm(), and comparing the first lm() to the second > lm(). I would imagine that there is a better (easier, I hope!) way to doing > all of this. Any ideas? >
You could do the following: # -------------------- x = rnorm(100) y=rnorm(100) y[40] = y[40]+30 # generate outliere df = data.frame(x=x,y=y) lmfit1 = lm(y~x, data=df) # fit all data thresh = 3 # Choose any data-dependent threshold nice = abs(dffits(lmfit)) < thresh # note that nice[40] is the only FALSE df2 = df[nice,] lmfit2 = lm(y~x, data=df2) summary(lmfit1) summary(lmfit2) # -------------------- However, this is a bit Denver-Style Home-Brewery. Instead of using this ad-hoc method, you are probably better off using one of the robust methods, for example in MASS. Dieter ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.