All the solutions in this thread so far use the lapply(split(...)) paradigm either directly or indirectly. That paradigm doesn't scale. That's the likely source of quite a few 'out of memory' errors and performance issues in R.
data.table doesn't do that internally, and it's syntax is pretty easy. > tmp <- data.table(index = gl(2,20), foo = rnorm(40)) > tmp[, .SD[head(order(-foo),5)], by=index] index index.1 foo [1,] 1 1 1.9677303 [2,] 1 1 1.2731872 [3,] 1 1 1.1100931 [4,] 1 1 0.8194719 [5,] 1 1 0.6674880 [6,] 2 2 1.2236383 [7,] 2 2 0.9606766 [8,] 2 2 0.8654497 [9,] 2 2 0.5404112 [10,] 2 2 0.3373457 > As you can see it currently repeats the group column which is a shame (on the to do list to fix). Matthew http://datatable.r-forge.r-project.org/ -- View this message in context: http://r.789695.n4.nabble.com/Sorting-and-subsetting-tp2547360p2548319.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.