On Tue, Sep 21, 2010 at 3:09 AM, Matthew Dowle <mdo...@mdowle.plus.com> wrote: > > > All the solutions in this thread so far use the lapply(split(...)) paradigm > either directly or indirectly. That paradigm doesn't scale. That's the > likely > source of quite a few 'out of memory' errors and performance issues in R.
This is a good point. It is not nearly as straightforward as the syntax for data.table (which seems to order and select in one step...very nice!), but this should be less memory intensive: tmp <- data.frame(index = gl(2,20), foo = rnorm(40)) tmp <- tmp[order(tmp$index, tmp$foo) , ] # find location of first instance of each level and add 0:4 to it x <- sapply(match(levels(tmp$index), tmp$index), `+`, 0:4) tmp[x, ] > > data.table doesn't do that internally, and it's syntax is pretty easy. > >> tmp <- data.table(index = gl(2,20), foo = rnorm(40)) > >> tmp[, .SD[head(order(-foo),5)], by=index] > index index.1 foo > [1,] 1 1 1.9677303 > [2,] 1 1 1.2731872 > [3,] 1 1 1.1100931 > [4,] 1 1 0.8194719 > [5,] 1 1 0.6674880 > [6,] 2 2 1.2236383 > [7,] 2 2 0.9606766 > [8,] 2 2 0.8654497 > [9,] 2 2 0.5404112 > [10,] 2 2 0.3373457 >> > > As you can see it currently repeats the group column which is a > shame (on the to do list to fix). > > Matthew > > http://datatable.r-forge.r-project.org/ > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Sorting-and-subsetting-tp2547360p2548319.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.