Hi: For this particular task, the aggregate() function and the doBy package provide nicely formatted output, but you may have to do some renaming.
Let's try a more expansive toy example with which one can do a bit more. df <- data.frame(grp1 = rep(c('x', 'y'), each = 40), grp2 = rep(rep(c('x', 'y'), each = 20), 2), grp3 = rep(rep(c('x', 'y'), each = 10), 4), a = 1:80, b = 81:160, d = 161:240) library(doBy) summaryBy(a + b + d ~ grp1 + grp2 + grp3, data = df, FUN = range) grp1 grp2 grp3 a.FUN1 a.FUN2 b.FUN1 b.FUN2 d.FUN1 d.FUN2 1 x x x 1 10 81 90 161 170 2 x x y 11 20 91 100 171 180 3 x y x 21 30 101 110 181 190 4 x y y 31 40 111 120 191 200 5 y x x 41 50 121 130 201 210 6 y x y 51 60 131 140 211 220 7 y y x 61 70 141 150 221 230 8 y y y 71 80 151 160 231 240 aggregate(cbind(a, b, d) ~ grp1 + grp2 + grp3, data = df, FUN = range) grp1 grp2 grp3 a.1 a.2 b.1 b.2 d.1 d.2 1 x x x 1 10 81 90 161 170 2 y x x 41 50 121 130 201 210 3 x y x 21 30 101 110 181 190 4 y y x 61 70 141 150 221 230 5 x x y 11 20 91 100 171 180 6 y x y 51 60 131 140 211 220 7 x y y 31 40 111 120 191 200 8 y y y 71 80 151 160 231 240 Each pair of columns associated with variables a, b and d correspond to the min and max of the values of these variables in each group. (range(x) returns a vector composed of the min and max of x, respectively.) Let's broaden our goals a bit. Suppose we want multiple summaries for multiple variables by multiple groups. An example function might be to return the min, max, mean, standard deviation and CV for each group. Here is a simple function that takes a numeric vector x as input and returns a (named) vector with the above summaries. f <- function(x) c(min = min(x), max = max(x), mean = mean(x), sd = sd(x), cv = mean(x)/sd(x)) # This function can be used directly in summaryBy() or aggregate(): summaryBy(a + b + d ~ grp1 + grp2 + grp3, data = df, FUN = f) aggregate(cbind(a, b, d) ~ grp1 + grp2 + grp3, data = df, FUN = f) # [Output omitted for the sake of brevity. Difference is in the variable names.] To get this to work in plyr or data.table, which are two packages that have numerous facilities for manipulating and summarizing data, we have to modify the function so that it outputs in a single row what the output of the two functions above generates. Because ddply() operates on data frames and data tables are slightly different data animals, the functions have to be rewritten slightly for each case. If you are new to packages, these have to be installed first before you can load and use them: # Uncomment the next line if you need to install these packages # install.packages(c('data.table', 'plyr')) # ddply() in package plyr library(plyr) g <- function(df) { c(a.min = min(df$a), a.max = max(df$a), a.mean = mean(df$a), a.sd = sd(df$a), a.cv = sd(df$a)/mean(df$a), b.min = min(df$b), b.max = max(df$b), b.mean = mean(df$b), b.sd = sd(df$b), b.cv = sd(df$b)/mean(df$b), d.min = min(df$d), d.max = max(df$d), d.mean = mean(df$d), d.sd = sd(df$d), d.cv = sd(df$d)/mean(df$d)) } ddply(df, .(grp1, grp2, grp3), g) # package data.table library(data.table) dt <- data.table(df, key = 'grp1, grp2, grp3') h <- function(df) { list(a.min = min(df$a), a.max = max(df$a), a.mean = mean(df$a), a.sd = sd(df$a), a.cv = sd(df$a)/mean(df$a), b.min = min(df$b), b.max = max(df$b), b.mean = mean(df$b), b.sd = sd(df$b), b.cv = sd(df$b)/mean(df$b), d.min = min(df$d), d.max = max(df$d), d.mean = mean(df$d), d.sd = sd(df$d), d.cv = sd(df$d)/mean(df$d)) } dt[, h(.SD), by = list(grp1, grp2, grp3)] # Note: the .SD as the argument of h() in the data table is a special 'sub-data' construct; # see the package's vignette and FAQ for further details R has a rich array of functions and a number of packages to summarize data. (I might also mention that sqldf is another package using SQL syntax on R data frames that would have worked well here, too.) Hopefully this gives you some idea of what can be done. All of these functions return data frames by default. I might also suggest, given the nature of your questions, that you take the time to read the Introduction to R manual, which explains many of the basic concepts and features in R. HTH, Dennis On Mon, Feb 7, 2011 at 8:29 PM, Al Roark <hrbuil...@hotmail.com> wrote: > > I'd like to summarize several variables in a data frame, for multiple > groups, and store the results in a data.frame. To do so, I'm using by(). For > example: > > > df<-data.frame(a=1:10,b=11:20,c=21:30,grp1=c("x","y"),grp2=c("x","y"),grp3=c("x","y")) > dfsum<-by(df[c("a","b","c")], df[c("grp1","grp2","grp3")], range) > > The result has a class of "by" and a mode of "list". I'm new to R and can't > find any documentation on this class, and don't see methods for it > associated with the as.data.frame. How should I go about coercing this to a > data frame? Is there a comprehensive source that I'm might be missing, > which can tell me such things? > > Cheers > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.