Trying to find a quick/slick/easily interpretable way to collapse a data set.
Suppose I have a data set that looks like this: h <- structure(list(INDEX = structure(1:6, .Label = c("1", "2", "3", "4", "5", "6"), class = "factor"), TICKS = c(0, 0, 0, 0, 0, 3 ), BROOD = structure(c(1L, 1L, 2L, 3L, 3L, 3L), .Label = c("501", "502", "503"), class = "factor"), HEIGHT = c(465, 465, 472, 475, 475, 475), YEAR = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("95", "96", "97"), class = "factor"), LOCATION = structure(c(1L, 1L, 2L, 3L, 3L, 3L), .Label = c("32", "36", "37"), class = "factor")), .Names = c("INDEX", "TICKS", "BROOD", "HEIGHT", "YEAR", "LOCATION"), row.names = c(NA, 6L), class = "data.frame") i.e., > h INDEX TICKS BROOD HEIGHT YEAR LOCATION 1 1 0 501 465 95 32 2 2 0 501 465 95 32 3 3 0 502 472 95 36 4 4 0 503 475 95 37 5 5 0 503 475 95 37 6 6 3 503 475 95 37 I want a data set that looks like this: BROOD TICKS.mean HEIGHT YEAR LOCATION 501 0 465 95 32 502 0 472 95 36 503 1 475 95 37 (for example). I.e., I want to collapse it to a dataset by brood, taking the mean of TICKS and reducing each of the other variables (would be nice to allow multiple summary statistics, e.g. TICKS.mean and TICKS.sd ...) In some ways, this is the opposite of a database join/merge operation -- I want to collapse the data frame back down. If I had the "unmerged" (i.e., the brood table) handy I could use it. I know I can construct this table a bit at a time, using tapply() or by() or aggregate() to get the means. Here's a solution that takes the first element of each factor and the mean of each numeric variable. I can imagine there are more general/flexible solutions. (One might want to specify more than one summary function, or specify that factors that vary within group should be dropped.) vtype = sapply(h,class) ## variable types [numeric or factor] vtypes = unique(vtype) ## possible types v2 = lapply(vtypes,function(z) which(vtype==z)) ## which are which? cfuns = list(factor=function(z)z[1],numeric=mean)## functions to apply m = mapply(function(w,f) { aggregate(h[w],list(h$BROOD),f) }, v2,cfuns,SIMPLIFY=FALSE) data.frame(m[[1]],m[[2]][-1]) My question is whether this is re-inventing the wheel. Is there some function or package that performs this task? cheers Ben Bolker -- View this message in context: http://www.nabble.com/collapsing-a-data-frame-tf4614195.html#a13177053 Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.