Hello all, Let me first say that this isn't a question about outliers. I am using the outlier function from the outliers package but I am using it only because it is a convenient wrapper to determine values that have the largest difference between itself and the sample mean. Where I am running into problems is that I am several groups where I want to calculate the "outlier" within that group. Then I want to create two data.frames, one with the "outliers" and the other those values dropped. And both dataframes need to include additional columns of data present before the subset. The first case is easy but I can't seem to figure out how to determine the next. So for example:
library(plyr) library(outliers) ## A dataframe with some obviously extreme values dfa <- data.frame(Mins=runif(15, 0,1), Fac=rep(c("Test1","Test2","Test3"), each=5)) df.out <- data.frame(Mins=c(3,4,5), Fac=c("Test1","Test2","Test3")) df <- rbind(dfa, df.out) df$Meta <- runif(18,4,5); df ## Dataframe with the extreme value To_remove<-ddply(df, c("Fac"), subset, Mins==outlier(Mins)); To_remove So now my question is how can I use this dataframe (To_remove) to remove all these values from df and create a new dataframe. Given a df (To_remove) with a list of values, how can I choose all values of another dataframe (df) that aren't those values in the To_remove dataframe?. There is a rm.outliers function in this same package but I having trouble with that and would like to try another approach. Thanks in advance! Sam ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.