On Mar 24, 2010, at 8:38 AM, Oscar Franzén wrote: > Dear all, > > I'm trying to find a a way to remove certain fields belonging to the same > group from a data frame structure. > > I have a data frame like this: > > foo v1 v2 v3 > 1 1 a > 6 2 a > 3 8 a > 4 4 b > 4 4 b > 2 1 c > 1 6 d > > Each row can then be grouped according to the third column: a, b, c, d. Then > I would like to remove all fields that belong to a group with less than X > members, for example less than 3 members, then > the resulting data frame structure would look like: > > > foo v1 v2 v3 > 1 1 a > 6 2 a > 3 8 a > > Is there some simple way to do this in R? > > Thanks in advance. > /Oscar
> DF v1 v2 v3 1 1 1 a 2 6 2 a 3 3 8 a 4 4 4 b 5 4 4 b 6 2 1 c 7 1 6 d > subset(DF, !v3 %in% names(which(table(v3) < 3))) v1 v2 v3 1 1 1 a 2 6 2 a 3 3 8 a The use of table() gets us: > table(DF$v3) < 3 a b c d FALSE TRUE TRUE TRUE followed by: > names(which(table(DF$v3) < 3)) [1] "b" "c" "d" which gives us the values of v3 that don't have at least 3 entries. When using subset(), the variables are evaluated first within the data frame, hence we can drop the 'DF$' in the function call. The use of "%in%" in subset() allows us to include or exclude certain values from a set comparison. We could also reverse the logic, yielding the same result: > subset(DF, v3 %in% names(which(table(v3) >= 3))) v1 v2 v3 1 1 1 a 2 6 2 a 3 3 8 a See ?table, ?subset and ?"%in%" for more information. HTH, Marc Schwartz ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.