Re: [R] Drop values of one dataframe based on the value of another

Ethan Brown Fri, 01 Jun 2012 17:22:25 -0700

Before using ddply, try adding an id variable to uniquely identify each
record (this is a good data integrity practice anyway). Then you can simply
create the new data frame by using all the ids that aren't in your
'To_remove' subset.


Here's the code for your example:

library(plyr)
library(outliers)

## A dataframe with some obviously extreme values
dfa <- data.frame(Mins=runif(15, 0,1),
Fac=rep(c("Test1","Test2","Test3"), each=5))
df.out <- data.frame(Mins=c(3,4,5), Fac=c("Test1","Test2","Test3"))
df <- rbind(dfa, df.out)
df$Meta <- runif(18,4,5)

##################################################
## add an id variable
df$id <- 1:nrow(df)
##################################################

## Dataframe with the extreme value
To_remove<-ddply(df, c("Fac"), subset, Mins==outlier(Mins)); To_remove

##################################################
## create dataframe without ids that are in To_remove
To_keep <- df[!(df$id %in% To_remove$id),]

## or, more compactly since in this case the ids are row numbers,
To_keep <- df[-To_remove$id,]

Best,
Ethan

P.S. Your email address and Google picture are so epic!

----
statisfactions.com -- the sounds of data and whimsy



On Fri, Jun 1, 2012 at 2:40 PM, Sam Albers <tonightstheni...@gmail.com>wrote:

> Hello all,
>
> Let me first say that this isn't a question about outliers. I am using
> the outlier function from the outliers package but I am using it only
> because it is a convenient wrapper to determine values that have the
> largest difference between itself and the sample mean. Where I am
> running into problems is that I am several groups where I want to
> calculate the "outlier" within that group. Then I want to create two
> data.frames, one with the "outliers" and the other those values
> dropped. And both dataframes need to include additional columns of
> data present before the subset. The first case is easy but I can't
> seem to figure out how to determine the next. So for example:
>
> library(plyr)
> library(outliers)
>
> ## A dataframe with some obviously extreme values
> dfa <- data.frame(Mins=runif(15, 0,1),
> Fac=rep(c("Test1","Test2","Test3"), each=5))
> df.out <- data.frame(Mins=c(3,4,5), Fac=c("Test1","Test2","Test3"))
> df <- rbind(dfa, df.out)
> df$Meta <- runif(18,4,5); df
>
> ## Dataframe with the extreme value
> To_remove<-ddply(df, c("Fac"), subset, Mins==outlier(Mins)); To_remove
>
> So now my question is how can I use this dataframe (To_remove) to
> remove all these values from df and create a new dataframe. Given a df
> (To_remove) with a list of values, how can I choose all values of
> another dataframe (df) that aren't those values in the To_remove
> dataframe?. There is a rm.outliers function in this same package but I
> having trouble with that and would like to try another approach.
>
> Thanks in advance!
>
> Sam
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Drop values of one dataframe based on the value of another

Reply via email to