Just for the fun of it, here are two more: by and ave.
> with(basicSub, by(score, student, mean)) student: 1 [1] 55 ------------------------------------------------------------ student: 2 [1] 60 ------------------------------------------------------------ student: 3 [1] 67.5 Not my favorite print method; to return a vector, do instead > as.vector(with(basicSub, by(score, student, mean))) [1] 55.0 60.0 67.5 You can cbind the unique student IDs to get a matrix result. ave() is used to map the average (or comparable summary) to each observation. By itself, it returns a vector of the same length as the number of observations: > with(basicSub, ave(score, student)) [1] 55.0 60.0 67.5 67.5 55.0 It's more useful if you want to add the means to the data frame: > transform(basicSub, avg = ave(score, student)) student gender score avg 1 1 m 50 55.0 2 2 m 60 60.0 3 3 f 70 67.5 4 3 f 65 67.5 5 1 m 60 55.0 That makes eight solutions. Any others? :) Dennis On Sun, Jan 3, 2010 at 8:14 PM, Gabor Grothendieck <ggrothendi...@gmail.com>wrote: > Here are 6 ways: > > 1. aggregate > > > aggregate(basicSub["score"], basicSub["student"], mean) > student score > 1 1 55.0 > 2 2 60.0 > 3 3 67.5 > > 2. tapply > > > with(basicSub, tapply(score, student, mean)) > 1 2 3 > 55.0 60.0 67.5 > > 3. summaryBy in doBy package > > > library(doBy) > > summaryBy(. ~ student, basicSub) > student score.mean > 1 1 55.0 > 2 2 60.0 > 3 3 67.5 > > 4. sqldf in sqldf package. Uses SQL: > > > library(sqldf) > > sqldf("select student, avg(score) from basicSub group by student") > student avg(score) > 1 1 55.0 > 2 2 60.0 > 3 3 67.5 > > 5. summary.formula in Hmisc > > > summary(score ~ student, basicSub) > score N=5 > > +-------+-+-+-----+ > | | |N|score| > +-------+-+-+-----+ > |student|1|2|55.0 | > | |2|1|60.0 | > | |3|2|67.5 | > +-------+-+-+-----+ > |Overall| |5|61.0 | > +-------+-+-+-----+ > > 6. plyr (see Dennis Murphy's solution in this thread) > > > On Sun, Jan 3, 2010 at 10:46 PM, david hilton shanabrook > <dhsha...@acad.umass.edu> wrote: > > I want to use aggregate with the mean function on specific columns > > > > gender <- factor(c("m", "m", "f", "f", "m")) > > student <- c(0001, 0002, 0003, 0003, 0001) > > score <- c(50, 60, 70, 65, 60) > > basicSub <- data.frame(student, gender, score) > > basicSubMean <- aggregate(basicSub, by=list(basicSub$student), FUN=mean, > na.rm=TRUE) > > > > This doesn't work, one cannot take the mean of a factor (gender). Is > there any way of specifying which columns to use for the mean? I want to > aggregate by student, obtaining mean scores, and assume any other factors > are unchanging in a specific student, ie. gender. > > > > Thanks > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.