on 07/24/2008 09:00 AM Daniel Wagner wrote:
Dear R users,
I have a dataframe with lot of duplicate cases and I want to delete duplicate ones which have low rank and keep that case which has highest rank.
e.g
df1
cno rank
1 1342 0.23
2 1342 0.14
3 1342 0.56
4 2568 0.15
5 2568 0.89
so I want to keep 3rd and 5th cases with highest rank (0.56 & 0.89) and delete rest of the duplicate cases.
Could somebody help me?
Regards
Daniel
Amsterdam
For the simple two column case, see ?aggregate:
> aggregate(dfl$rank, list(cno = dfl$cno), max)
cno x
1 1342 0.56
2 2568 0.89
A more generic approach might be:
> do.call(rbind, lapply(split(dfl, dfl$cno),
function(x) x[which.max(x$rank), ]))
cno rank
1342 1342 0.56
2568 2568 0.89
For example, using the iris dataset, get the rows, by Species, with the
highest Sepal.Length:
> do.call(rbind, lapply(split(iris, iris$Species),
function(x) x[which.max(x$Sepal.Length), ]))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
setosa 5.8 4.0 1.2 0.2 setosa
versicolor 7.0 3.2 4.7 1.4 versicolor
virginica 7.9 3.8 6.4 2.0 virginica
HTH,
Marc Schwartz
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.