Re: [R] SLOW split() function

2011-10-13 Thread Joshua Wiley
Very nice! I am quite impressed at how flexible data.table is. On Thu, Oct 13, 2011 at 1:05 AM, Matthew Dowle wrote: > Using Josh's nice example, with data.table's built-in 'by' (optimised > grouping) yields a 6 times speedup (100 seconds down to 15 on > my netbook). > >> system.time(all.2b <- l

Re: [R] SLOW split() function

2011-10-13 Thread Matthew Dowle
Using Josh's nice example, with data.table's built-in 'by' (optimised grouping) yields a 6 times speedup (100 seconds down to 15 on my netbook). > system.time(all.2b <- lapply(si, function(.indx) { coef(lm(y ~ + x, data=d[.indx,])) })) user system elapsed 144.501 0.300 145.525 > system.

Re: [R] SLOW split() function

2011-10-11 Thread Thomas Lumley
On Wed, Oct 12, 2011 at 4:56 AM, ivo welch wrote: > thanks, josh.  in my posting example, I did not need anything except > coefficients.  (when this is the case, I usually do not even use > lm.fit, but I eliminate all missing obs first and then use solve > crossprod(y,cbind(1,x)) crossprod(cbind(1

Re: [R] SLOW split() function

2011-10-11 Thread Joshua Wiley
> (and assumes the data.frame doesn't include matrices >>>> or other data.frames) and relies on split(vector,factor) >>>> quickly splitting a vector into a list of vectors. >>>> For a 10^6 row by 10 column data.frame split in 10^5 >>>> groups

Re: [R] SLOW split() function

2011-10-11 Thread ivo welch
omething based on this idea would help your >>> parallelized by(). >>> >>> mysplit.data.frame <- >>> function (x, f, drop = FALSE, ...) >>> { >>>    f <- as.factor(f) >>>    tmp <- lapply(x, function(xi) split(xi, f, drop =

Re: [R] SLOW split() function

2011-10-10 Thread Joshua Wiley
)) >>    tmp <- lapply(setNames(seq_along(tmp), names(tmp)), function(i) { >>        t <- tmp[[i]] >>        names(t) <- names(x) >>        attr(t, "row.names") <- rn[[i]] >>        class(t) <- "data.frame" >>        t >>    })

Re: [R] SLOW split() function

2011-10-10 Thread ivo welch
) { >        t <- tmp[[i]] >        names(t) <- names(x) >        attr(t, "row.names") <- rn[[i]] >        class(t) <- "data.frame" >        t >    }) >    tmp > } > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com >

Re: [R] SLOW split() function

2011-10-10 Thread William Dunlap
n > Behalf Of Jim Holtman > Sent: Monday, October 10, 2011 7:29 PM > To: ivo welch > Cc: r-help > Subject: Re: [R] SLOW split() function > > instead of spliting the entire dataframe, split the indices and then use > these to access your data: > try > > system.tim

Re: [R] SLOW split() function

2011-10-10 Thread Dennis Murphy
I tried this: library(data.table) N <- 1000 T <- N*10 d <- data.table(gp= rep(1:T, rep(N,T)), val=rnorm(N*T), key = 'gp') dim(d) [1] 10002 # On my humble 8Gb system, > system.time(l <- d[, split(val, gp)]) user system elapsed 4.150.094.27 I wouldn't be surprise

Re: [R] SLOW split() function

2011-10-10 Thread Jim Holtman
instead of spliting the entire dataframe, split the indices and then use these to access your data: try system.time(s <- split(seq(nrow(d)), d$key)) this should be faster and less memory intensive. you can then use the indices to access the subset: result <- lapply(s, function(.indx){ do