Probably true, thats cunning, but look at base::match. The first thing it does is coerce factor to character (an allocate and copy needed internally). data.table doesn't do that either, see data.table:::sortedmatch.
I made first basic steps towards a proper reproducible test suite (timings.Rnw). Perhaps this example could be added there; PDF is on the homepage. One test is 340 times faster and the other is 13 times faster. More examples would be good. Matthew http://datatable.r-forge.r-project.org/ "Joshua Wiley" <jwiley.ps...@gmail.com> wrote in message news:aanlktimyuvl9suj65ktzqvpnyn+ep8ubu3mxxhhrd...@mail.gmail.com... > On Tue, Sep 21, 2010 at 3:09 AM, Matthew Dowle <mdo...@mdowle.plus.com> > wrote: >> >> >> All the solutions in this thread so far use the lapply(split(...)) >> paradigm >> either directly or indirectly. That paradigm doesn't scale. That's the >> likely >> source of quite a few 'out of memory' errors and performance issues in R. > > This is a good point. It is not nearly as straightforward as the > syntax for data.table (which seems to order and select in one > step...very nice!), but this should be less memory intensive: > > tmp <- data.frame(index = gl(2,20), foo = rnorm(40)) > tmp <- tmp[order(tmp$index, tmp$foo) , ] > > # find location of first instance of each level and add 0:4 to it > x <- sapply(match(levels(tmp$index), tmp$index), `+`, 0:4) > > tmp[x, ] > >> >> data.table doesn't do that internally, and it's syntax is pretty easy. >> >>> tmp <- data.table(index = gl(2,20), foo = rnorm(40)) >> >>> tmp[, .SD[head(order(-foo),5)], by=index] >> index index.1 foo >> [1,] 1 1 1.9677303 >> [2,] 1 1 1.2731872 >> [3,] 1 1 1.1100931 >> [4,] 1 1 0.8194719 >> [5,] 1 1 0.6674880 >> [6,] 2 2 1.2236383 >> [7,] 2 2 0.9606766 >> [8,] 2 2 0.8654497 >> [9,] 2 2 0.5404112 >> [10,] 2 2 0.3373457 >>> >> >> As you can see it currently repeats the group column which is a >> shame (on the to do list to fix). >> >> Matthew >> >> http://datatable.r-forge.r-project.org/ >> >> >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/Sorting-and-subsetting-tp2547360p2548319.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > University of California, Los Angeles > http://www.joshuawiley.com/ > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.