Re: [R] Subsetting for the ten highest values by group in a dataframe

Hadley Wickham Sat, 28 Jan 2012 05:57:51 -0800

On Fri, Jan 27, 2012 at 1:26 PM, Sam Albers <tonightstheni...@gmail.com> wrote:
> Hello,
>
> I am looking for a way to subset a data frame by choosing the top ten
> maximum values from that dataframe. As well this occurs within some
> factor levels.
>
> ## I've used plyr here but I'm not married to this approach
> require(plyr)
>
> ## I've created a data.frame with two groups and then a id variable (y)
> df <- data.frame(x=rnorm(400, mean=20), y=1:400, z=c("A","B"))
>
> ## So using ddply I can find the highest value of x
> df.max1 <- ddply(df, c("z"), subset, x==sort(x, TRUE)[1])
>
> ## Or the 2nd highest value
> df.max2 <- ddply(df, c("z"), subset, x==sort(x, TRUE)[2])
>
> ## And so on.... but when I try to make a series of numbers like so
> ## to get the top ten values, I don't get a warning message but
> ## two values that don't really make sense to me
> df.max <- ddply(df, c("z"), subset, x==sort(x, TRUE)[1:10])


Well, sort returns a vector, so you probably want

df.max <- ddply(df, c("z"), subset, x %in% sort(x, TRUE)[1:10])

but it would be better to do (e.g.)

df.max <- ddply(df, c("z"), subset, rank(x) <= 10)

which will also make it possible to deal with ties in a principled way.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsetting for the ten highest values by group in a dataframe

Reply via email to