Re: [R] aggregate() runs out of memory

2012-11-27 Thread Sam Steingold
> * Steve Lianoglou [2012-11-27 12:53:23 > -0500]: > On Tue, Nov 27, 2012 at 11:29 AM, Sam Steingold wrote: >>> * Steve Lianoglou [2012-11-26 19:47:25 >>> -0500]: > [snip] >>> It just occurred to me that this is even better: >>> >>> R> setkeyv(f, c("share.id", "delay")) >>> R> result <- f[, l

Re: [R] aggregate() runs out of memory

2012-11-27 Thread Steve Lianoglou
Hi, On Tue, Nov 27, 2012 at 11:29 AM, Sam Steingold wrote: >> * Steve Lianoglou [2012-11-26 19:47:25 >> -0500]: [snip] >> It just occurred to me that this is even better: >> >> R> setkeyv(f, c("share.id", "delay")) >> R> result <- f[, list(min=delay[1L], max=delay[.N], count=.N, >> country=cou

Re: [R] aggregate() runs out of memory

2012-11-27 Thread Sam Steingold
> * Steve Lianoglou [2012-11-26 19:47:25 > -0500]: > > On Monday, November 26, 2012, Sam Steingold wrote: > [snip] > >> >> there is precisely one country for each id. >> i.e., unique(country) is the same as country[1]. >> thanks a lot for the suggestion! >> >> > R> result <- f[, list(min=min(dela

Re: [R] aggregate() runs out of memory

2012-11-26 Thread Steve Lianoglou
On Monday, November 26, 2012, Sam Steingold wrote: [snip] > > there is precisely one country for each id. > i.e., unique(country) is the same as country[1]. > thanks a lot for the suggestion! > > > R> result <- f[, list(min=min(delay), max=max(delay), > > count=.N,country=country[1L]), by="share.i

Re: [R] aggregate() runs out of memory

2012-11-26 Thread Sam Steingold
Hi, > * Steve Lianoglou [2012-11-26 17:32:21 > -0500]: > >> --8<---cut here---start->8--- >>> f <- data.frame(id=rep(1:3,4),country=rep(6:8,4),delay=1:12) >>> f >>id country delay >> 1 1 6 1 >> 2 2 7 2 >> 3 3 8 3 >> 4

Re: [R] aggregate() runs out of memory

2012-11-26 Thread Steve Lianoglou
Hi, On Mon, Nov 26, 2012 at 4:57 PM, Sam Steingold wrote: [snip] >> Could you please copy paste the output of `(head(infl, 20))` as >> well as an approximation of what the result is that you want. Don't know how "dput" got clipped in your reply from the quoted text I wrote, but I actually asked

Re: [R] aggregate() runs out of memory

2012-11-26 Thread Sam Steingold
hi Steve, > * Steve Lianoglou [2012-11-26 16:08:59 > -0500]: > On Mon, Nov 26, 2012 at 3:13 PM, Sam Steingold wrote: >>> * Steve Lianoglou [2012-11-19 13:30:03 >>> -0800]: >>> >>> For instance, if you want the min and max of `delay` within each group >>> defined by `share.id`, and let's assum

Re: [R] aggregate() runs out of memory

2012-11-26 Thread Steve Lianoglou
Hi Sam, On Mon, Nov 26, 2012 at 3:13 PM, Sam Steingold wrote: > Hi, > >> * Steve Lianoglou [2012-11-19 13:30:03 >> -0800]: >> >> For instance, if you want the min and max of `delay` within each group >> defined by `share.id`, and let's assume `infl` is a data.frame, you >> can do something like

Re: [R] aggregate() runs out of memory

2012-11-26 Thread Sam Steingold
Hi, > * Steve Lianoglou [2012-11-19 13:30:03 > -0800]: > > For instance, if you want the min and max of `delay` within each group > defined by `share.id`, and let's assume `infl` is a data.frame, you > can do something like so: > > R> as.data.table(infl) > R> setkey(infl, share.id) > R> result <

Re: [R] aggregate() runs out of memory

2012-11-19 Thread David Winsemius
On Nov 19, 2012, at 1:25 PM, Sam Steingold wrote: > Thanks Steve, > what is the analogue of .N for min and max? ?seq > i.e., what is the data.table's version of > aggregate(infl$delay,by=list(infl$share.id),FUN=min) > aggregate(infl$delay,by=list(infl$share.id),FUN=max) > DT[, list( max(v)),

Re: [R] aggregate() runs out of memory

2012-11-19 Thread Steve Lianoglou
Hi, On Mon, Nov 19, 2012 at 1:25 PM, Sam Steingold wrote: > Thanks Steve, > what is the analogue of .N for min and max? > i.e., what is the data.table's version of > aggregate(infl$delay,by=list(infl$share.id),FUN=min) > aggregate(infl$delay,by=list(infl$share.id),FUN=max) > thanks! It would be

Re: [R] aggregate() runs out of memory

2012-11-19 Thread Sam Steingold
Thanks Steve, what is the analogue of .N for min and max? i.e., what is the data.table's version of aggregate(infl$delay,by=list(infl$share.id),FUN=min) aggregate(infl$delay,by=list(infl$share.id),FUN=max) thanks! Sam. On Fri, Sep 14, 2012 at 3:40 PM, Steve Lianoglou wrote: > Hi, > > On Fri, Sep

Re: [R] aggregate() runs out of memory

2012-09-14 Thread Steve Lianoglou
Hi, On Fri, Sep 14, 2012 at 4:26 PM, Dennis Murphy wrote: > Hi: > > This should give you some idea of what Steve is talking about: > > library(data.table) > dt <- data.table(x = sample(10, 1000, replace = TRUE), > y = rnorm(1000), key = "x") > dt[, .N, by = x] > syst

Re: [R] aggregate() runs out of memory

2012-09-14 Thread Dennis Murphy
Hi: This should give you some idea of what Steve is talking about: library(data.table) dt <- data.table(x = sample(10, 1000, replace = TRUE), y = rnorm(1000), key = "x") dt[, .N, by = x] system.time(dt[, .N, by = x]) ...on my system, dual core 8Gb RAM running Win7 6

Re: [R] aggregate() runs out of memory

2012-09-14 Thread William Dunlap
n...@r-project.org] On > Behalf > Of Steve Lianoglou > Sent: Friday, September 14, 2012 12:41 PM > To: s...@gnu.org; r-help@r-project.org > Subject: Re: [R] aggregate() runs out of memory > > Hi, > > On Fri, Sep 14, 2012 at 3:26 PM, Sam Steingold wrote: > > I h

Re: [R] aggregate() runs out of memory

2012-09-14 Thread Steve Lianoglou
Hi, On Fri, Sep 14, 2012 at 3:26 PM, Sam Steingold wrote: > I have a large data.frame Z (2,424,185,944 bytes, 10,256,441 rows, 17 > columns). > I want to get the result of > table(aggregate(Z$V1, FUN = length, by = list(id=Z$V2))$x) > alas, aggregate has been running for ~30 minute, RSS is 14G,

[R] aggregate() runs out of memory

2012-09-14 Thread Sam Steingold
I have a large data.frame Z (2,424,185,944 bytes, 10,256,441 rows, 17 columns). I want to get the result of table(aggregate(Z$V1, FUN = length, by = list(id=Z$V2))$x) alas, aggregate has been running for ~30 minute, RSS is 14G, VIRT is 24.3G, and no end in sight. both V1 and V2 are characters (not