Before using ddply, try adding an id variable to uniquely identify each record (this is a good data integrity practice anyway). Then you can simply create the new data frame by using all the ids that aren't in your 'To_remove' subset.
Here's the code for your example: library(plyr) library(outliers) ## A dataframe with some obviously extreme values dfa <- data.frame(Mins=runif(15, 0,1), Fac=rep(c("Test1","Test2","Test3"), each=5)) df.out <- data.frame(Mins=c(3,4,5), Fac=c("Test1","Test2","Test3")) df <- rbind(dfa, df.out) df$Meta <- runif(18,4,5) ################################################## ## add an id variable df$id <- 1:nrow(df) ################################################## ## Dataframe with the extreme value To_remove<-ddply(df, c("Fac"), subset, Mins==outlier(Mins)); To_remove ################################################## ## create dataframe without ids that are in To_remove To_keep <- df[!(df$id %in% To_remove$id),] ## or, more compactly since in this case the ids are row numbers, To_keep <- df[-To_remove$id,] Best, Ethan P.S. Your email address and Google picture are so epic! ---- statisfactions.com -- the sounds of data and whimsy On Fri, Jun 1, 2012 at 2:40 PM, Sam Albers <tonightstheni...@gmail.com>wrote: > Hello all, > > Let me first say that this isn't a question about outliers. I am using > the outlier function from the outliers package but I am using it only > because it is a convenient wrapper to determine values that have the > largest difference between itself and the sample mean. Where I am > running into problems is that I am several groups where I want to > calculate the "outlier" within that group. Then I want to create two > data.frames, one with the "outliers" and the other those values > dropped. And both dataframes need to include additional columns of > data present before the subset. The first case is easy but I can't > seem to figure out how to determine the next. So for example: > > library(plyr) > library(outliers) > > ## A dataframe with some obviously extreme values > dfa <- data.frame(Mins=runif(15, 0,1), > Fac=rep(c("Test1","Test2","Test3"), each=5)) > df.out <- data.frame(Mins=c(3,4,5), Fac=c("Test1","Test2","Test3")) > df <- rbind(dfa, df.out) > df$Meta <- runif(18,4,5); df > > ## Dataframe with the extreme value > To_remove<-ddply(df, c("Fac"), subset, Mins==outlier(Mins)); To_remove > > So now my question is how can I use this dataframe (To_remove) to > remove all these values from df and create a new dataframe. Given a df > (To_remove) with a list of values, how can I choose all values of > another dataframe (df) that aren't those values in the To_remove > dataframe?. There is a rm.outliers function in this same package but I > having trouble with that and would like to try another approach. > > Thanks in advance! > > Sam > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.