David, Please allow me to digress a lot here. You are one of the few (inlcuding yours truly!) that uses the phrase "shallow learning curve" to indicate difficulty of learning (I assume this is what you meant). I always felt that "steep learning curve" was incorrect. If you plotted the amount of learning on the Y-axis and time on the X-axis, a steep learning curve means that one learns very quickly, but this is just the opposite of what is actually meant.
Best, Ravi. ____________________________________________________________________ Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu ----- Original Message ----- From: David Winsemius <dwinsem...@comcast.net> Date: Tuesday, February 8, 2011 10:09 pm Subject: Re: [R] Removing Outliers Function To: kirtau <kir...@live.com> Cc: r-help@r-project.org > On Feb 8, 2011, at 9:11 PM, kirtau wrote: > > > > >I am working on a function that will remove outliers for regression > analysis. > >I am stating that a data point is an outlier if its studentized > residual is > >above or below 3 and -3, respectively. The code below is what i have > thus > >far for the function > > > >x = c(1:20) > >y = c(1,3,4,2,5,6,18,8,10,8,11,13,14,14,15,85,17,19,19,20) > >data1 = data.frame(x,y) > > > > > >rm.outliers = function(dataset,dependent,independent){ > > dataset$predicted = predict(lm(dependent~independent)) > > dataset$stdres = rstudent(lm(dependent~independent)) > > m = 1 > > for(i in 1:length(dataset$stdres)){ > > dataset$outlier_counter[i] = if(dataset$stdres[i] >= 3 | > >dataset$stdres[i] <= -3) {m} else{0} > > } > > j = length(which(dataset$outlier_counter >= 1)) > > while(j>=1){ > > print(dataset[which(dataset$outlier_counter >= 1),]) > > dataset = dataset[which(dataset$outlier_counter == 0),] > > dataset$predicted = predict(lm(dependent~independent)) > > dataset$stdres = rstudent(lm(dependent~independent)) > > m = m+1 > > for(k in 1:length(dataset$stdres)){ > > dataset$outlier_counter[k] = if(dataset$stdres[k] >= 3 | > >dataset$stdres[k] <= -3) {m} else{0} > > } > > j = length(which(dataset$outlier_counter >= 1)) > > } > > return(dataset) > >} > > > >The problem that I run into is that i receive this error when i type > > > >rm.outliers(data1,data1$y,data1$x) > > > >" x y predicted stdres outlier_counter > >16 16 85 22.98647 24.04862 1 > >Error in `$<-.data.frame`(`*tmp*`, "predicted", value = c(0.114285714285714, > >: > > replacement has 20 rows, data has 19" > > > >Note: the outlier_counter variable is used to state which "round" of > the > >loop the datapoint was marked as an outlier. > > > >This would be a HUGE help to me and a few buddies who run a lot of different > >regression tests. > > The solution is about 3 or 4 lines of code to make the function, but > removing outliers like this is simply statistical malpractice. Maybe > it's a good thing that R has a shallow learning curve. > > -- > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help@r-project.org mailing list > > PLEASE do read the posting guide > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.