on 07/24/2008 09:00 AM Daniel Wagner wrote:
Dear R users,
I have a dataframe with lot of duplicate cases and I want to delete duplicate ones which have low rank and keep that case which has highest rank.
e.g
df1
  cno      rank
1  1342    0.23
2  1342    0.14
3  1342    0.56
4  2568    0.15
5  2568    0.89
so I want to keep 3rd and 5th cases with highest rank (0.56 & 0.89) and delete rest of the duplicate cases.
Could somebody help me?
Regards Daniel
Amsterdam

For the simple two column case, see ?aggregate:

> aggregate(dfl$rank, list(cno = dfl$cno), max)
   cno    x
1 1342 0.56
2 2568 0.89


A more generic approach might be:

> do.call(rbind, lapply(split(dfl, dfl$cno),
                        function(x) x[which.max(x$rank), ]))
      cno rank
1342 1342 0.56
2568 2568 0.89


For example, using the iris dataset, get the rows, by Species, with the highest Sepal.Length:


> do.call(rbind, lapply(split(iris, iris$Species),
                        function(x) x[which.max(x$Sepal.Length), ]))
           Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
setosa              5.8         4.0          1.2         0.2     setosa
versicolor          7.0         3.2          4.7         1.4 versicolor
virginica           7.9         3.8          6.4         2.0  virginica


HTH,

Marc Schwartz

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to