On Mon, Apr 6, 2009 at 9:34 AM, Stavros Macrakis <macra...@alum.mit.edu> wrote: > There are various ways to do this in R. > > # sample data > dd <- data.frame(a=1:10,b=sample(3,10,replace=T),c=sample(3,10,replace=T)) > > Using the standard built-in functions, you can use: > > *** aggregate *** > > aggregate(dd,list(b=dd$b,c=dd$c),sum) > b c a b c > 1 1 1 10 2 2 > 2 2 1 3 2 1 > .... > > *** tapply *** > > tapply(dd$a,interaction(dd$b,dd$c),sum) > 1.1 2.1 3.1 1.2 2.2 3.2 1.3 > 2.3 > 5.000000 3.000000 10.000000 5.000000 NA NA 5.000000 > ... > > But the nicest way is probably to use the plyr package: > >> library(plyr) >> ddply(dd,~b+c,sum) > b c V1 > 1 1 1 14 > 2 2 1 6 > .... > > ******** > > Unfortunately, none of these approaches allows you do return more than one > result from the function, so you'll need to write > >> ddply(dd,~b+c,length) # count >> ddply(dd,~b+c,sum) >> ddply(dd,~b+c,mean) # arithmetic average > > There is an 'each' function in plyr, but it doesn't seem to be compatible > with ddply.
That's because ddply applies the function to the whole data frame, not just the columns that aren't participating in the split. One way around it is: ddply(dd, ~ b + c, function(df) each(length, sum, mean)(df$a)) I haven't figured out a more elegant way to specify this yet. Hadley -- http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.