Re: [R] Sorting and subsetting

2010-09-21 Thread Matthew Dowle
See data.table:::duplist which does that (or at least very similar) in C, for multiple columns too. Matthew http://datatable.r-forge.r-project.org/ "peter dalgaard" wrote in message news:660991c3-b52b-4d58-b819-eadc95ecc...@gmail.com... > > On Sep 21, 2010, at 16:27 , Joshua Wiley wrote: > >>

Re: [R] Sorting and subsetting

2010-09-21 Thread peter dalgaard
On Sep 21, 2010, at 16:27 , Joshua Wiley wrote: > On Tue, Sep 21, 2010 at 3:09 AM, Matthew Dowle wrote: >> >> >> All the solutions in this thread so far use the lapply(split(...)) paradigm >> either directly or indirectly. That paradigm doesn't scale. That's the >> likely >> source of quite a

Re: [R] Sorting and subsetting

2010-09-21 Thread Matthew Dowle
Probably true, thats cunning, but look at base::match. The first thing it does is coerce factor to character (an allocate and copy needed internally). data.table doesn't do that either, see data.table:::sortedmatch. I made first basic steps towards a proper reproducible test suite (timings.Rnw).

Re: [R] Sorting and subsetting

2010-09-21 Thread Joshua Wiley
On Tue, Sep 21, 2010 at 3:09 AM, Matthew Dowle wrote: > > > All the solutions in this thread so far use the lapply(split(...)) paradigm > either directly or indirectly. That paradigm doesn't scale. That's the > likely > source of quite a few 'out of memory' errors and performance issues in R. Thi

Re: [R] Sorting and subsetting

2010-09-21 Thread Matthew Dowle
All the solutions in this thread so far use the lapply(split(...)) paradigm either directly or indirectly. That paradigm doesn't scale. That's the likely source of quite a few 'out of memory' errors and performance issues in R. data.table doesn't do that internally, and it's syntax is pretty eas

Re: [R] Sorting and subsetting

2010-09-20 Thread Peter Dalgaard
On 09/20/2010 08:01 PM, David Winsemius wrote: > indexfoo > 1.6 1 -3.0267759 > 1.7 1 -1.3725536 > 1.19 1 -1.1476048 > 1.16 1 -1.0963967 > 1.2 1 -1.0684793 > 2.29 2 -1.6601486 > 2.21 2 -1.2633632 > 2.22 2 -0.9875626 > 2.38 2 -0.9515301 > 2.30

Re: [R] Sorting and subsetting

2010-09-20 Thread Joshua Wiley
On Mon, Sep 20, 2010 at 11:15 AM, David Winsemius wrote: > > On Sep 20, 2010, at 2:01 PM, David Winsemius wrote: > >> >> On Sep 20, 2010, at 1:40 PM, Joshua Wiley wrote: >> >>> On Mon, Sep 20, 2010 at 10:27 AM, Phil Spector >>> wrote: Harold -  Two ways that come to mind:

Re: [R] Sorting and subsetting

2010-09-20 Thread William Dunlap
Richard Tan asked a very similar question last week ('get top n rows group by a column from a dataframe'). You could use ave() to make a sequence-number-within-group vector and choose rows with a small enough value there: tmp[ave(integer(nrow(tmp)), tmp$index, FUN=seq_along)<=N, ] If there are f

Re: [R] Sorting and subsetting

2010-09-20 Thread David Winsemius
On Sep 20, 2010, at 2:01 PM, David Winsemius wrote: On Sep 20, 2010, at 1:40 PM, Joshua Wiley wrote: On Mon, Sep 20, 2010 at 10:27 AM, Phil Spector wrote: Harold - Two ways that come to mind: 1) do.call(rbind,lapply(split(tmp,tmp$index),function(x)x[1:5,])) 2) subset(tmp,unlist(tapply(fo

Re: [R] Sorting and subsetting

2010-09-20 Thread David Winsemius
On Sep 20, 2010, at 1:40 PM, Joshua Wiley wrote: On Mon, Sep 20, 2010 at 10:27 AM, Phil Spector wrote: Harold - Two ways that come to mind: 1) do.call(rbind,lapply(split(tmp,tmp$index),function(x)x[1:5,])) 2) subset(tmp,unlist(tapply(foo,index,seq))<=5) 3) do.call(rbind, by(tmp, tmp$index

Re: [R] Sorting and subsetting

2010-09-20 Thread Peter Dalgaard
On 09/20/2010 07:16 PM, Doran, Harold wrote: > tmp1 <- tmp1[1:5,] > tmp2 <- tmp2[1:5,] > result <- rbind(tmp1, tmp2) > > Does anyone see a way to subset and subsequently bind without a loop? > > do.call(rbind,lapply(split(tmp,tmp$index),head,5)) indexfoo 1.11 1 -1.5124909 1.10

Re: [R] Sorting and subsetting

2010-09-20 Thread Joshua Wiley
On Mon, Sep 20, 2010 at 10:27 AM, Phil Spector wrote: > Harold - >   Two ways that come to mind: > > 1) do.call(rbind,lapply(split(tmp,tmp$index),function(x)x[1:5,])) > 2) subset(tmp,unlist(tapply(foo,index,seq))<=5) 3) do.call(rbind, by(tmp, tmp$index, .Primitive("["), 1:5, 1:2)) Josh > >      

Re: [R] Sorting and subsetting

2010-09-20 Thread Tal Galili
Hi Harold, I thought of one way to do this, but maybe (probably) there is a faster way: tmp <- data.frame(index = gl(3,20), foo = rnorm(60)) subset.first.x.elements <- function(INDEX, num.of.elements = 5) { t.INDEX <- table(factor(INDEX, levels = unique(INDEX))) running.indexes <- unlist(sappl

Re: [R] Sorting and subsetting

2010-09-20 Thread Doran, Harold
Very nice, Phil. Thank you. -Original Message- From: Phil Spector [mailto:spec...@stat.berkeley.edu] Sent: Monday, September 20, 2010 1:28 PM To: Doran, Harold Cc: R-help Subject: Re: [R] Sorting and subsetting Harold - Two ways that come to mind: 1) do.call(rbind,lapply(split(tmp

Re: [R] Sorting and subsetting

2010-09-20 Thread Phil Spector
Harold - Two ways that come to mind: 1) do.call(rbind,lapply(split(tmp,tmp$index),function(x)x[1:5,])) 2) subset(tmp,unlist(tapply(foo,index,seq))<=5) - Phil Spector Statistical Computing Facility