Beware of facile comparisons of this sort -- they may be apples and nematodes.
I cannot speak to the others, but (1) tapply does not yield a data frame and (2) tapply actually **is** a (efficient, disguised) loop (at the interpreter level, essentially). I suspect what makes it so much faster is that it avoids the overhead of setting up careful data structures that the others provide and (2) the underlying summarizing function is sum(), which does its work at the c, not the interpreted level. If it were a user function -- maybe mysum <- function(x)sum(x) -- I suspect the discrepancy might not be so large (try it!) Naturally, I am prepared to be instructed and corrected on this either by you or someone wiser on these matters. -- Bert On Wed, Nov 3, 2010 at 3:16 PM, Dimitri Liakhovitski <dimitri.liakhovit...@gmail.com> wrote: > Here is the summary of methods. tapply is the fastest! > > library(reshape) > > system.time(for(i in 1:1000)cast(melt(mydf, measure.vars = "value"), > city ~ brand,fun.aggregate = sum)) > user system elapsed > > 18.40 0.00 18.44 > > library(reshape2) > system.time(for(i in 1:1000)dcast(mydf,city ~ brand, sum)) > user system elapsed > 12.36 0.02 12.37 > > > system.time(for(i in 1:1000)xtabs(value ~ city + brand, mydf)) > > user system elapsed > > 2.45 0.00 2.47 > > > system.time(for(i in 1:1000)tapply(mydf$value,mydf[c('city','brand')],sum)) > > user system elapsed > > 0.78 0.00 0.79 > > Dimitri > > > On Wed, Nov 3, 2010 at 4:32 PM, Henrique Dallazuanna <www...@gmail.com> wrote: >> Try this: >> >> xtabs(value ~ city + brand, mydf) >> >> On Wed, Nov 3, 2010 at 6:23 PM, Dimitri Liakhovitski >> <dimitri.liakhovit...@gmail.com> wrote: >>> >>> Hello! >>> >>> I have a data frame like this one: >>> >>> >>> mydf<-data.frame(city=c("a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b"), >>> brand=c("x","x","y","y","z","z","z","z","x","x","x","y","y","y","z","z"), >>> value=c(1,2,11,12,111,112,113,114,3,4,5,13,14,15,115,116)) >>> (mydf) >>> >>> What I need to get is a data frame like the one below - cities as >>> rows, brands as columns, and the sums of the "value" within each >>> city/brand combination in the body of the data frame: >>> >>> city x y z >>> a 3 23 336 >>> b 7 42 231 >>> >>> >>> I have written a code that involves multiple loops and subindexing - >>> but it's taking too long. >>> I am sure there must be a more efficient way of doing it. >>> >>> Thanks a lot for your hints! >>> >>> >>> -- >>> Dimitri Liakhovitski >>> Ninah Consulting >>> www.ninah.com >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> -- >> Henrique Dallazuanna >> Curitiba-Paraná-Brasil >> 25° 25' 40" S 49° 16' 22" O >> > > > > -- > Dimitri Liakhovitski > Ninah Consulting > www.ninah.com > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Bert Gunter Genentech Nonclinical Biostatistics ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.