Re: [R] function in aggregate applied to specific columns only

Dennis Murphy Sun, 03 Jan 2010 20:58:43 -0800

Just for the fun of it, here are two more: by and ave.


> with(basicSub, by(score, student, mean))
student: 1
[1] 55
------------------------------------------------------------
student: 2
[1] 60
------------------------------------------------------------
student: 3
[1] 67.5

Not my favorite print method;  to return a vector, do instead
> as.vector(with(basicSub, by(score, student, mean)))
[1] 55.0 60.0 67.5
You can cbind the unique student IDs to get a matrix result.

ave() is used to map the average (or comparable summary) to each
observation.
By itself, it returns a vector of the same length as the number of
observations:
> with(basicSub, ave(score, student))
[1] 55.0 60.0 67.5 67.5 55.0

It's more useful if you want to add the means to the data frame:
> transform(basicSub, avg = ave(score, student))
  student gender score  avg
1       1      m    50 55.0
2       2      m    60 60.0
3       3      f    70 67.5
4       3      f    65 67.5
5       1      m    60 55.0

That makes eight solutions. Any others?  :)

Dennis


On Sun, Jan 3, 2010 at 8:14 PM, Gabor Grothendieck
<ggrothendi...@gmail.com>wrote:

> Here are 6 ways:
>
> 1. aggregate
>
> > aggregate(basicSub["score"], basicSub["student"], mean)
>  student score
> 1       1  55.0
> 2       2  60.0
> 3       3  67.5
>
> 2. tapply
>
> > with(basicSub, tapply(score, student, mean))
>   1    2    3
> 55.0 60.0 67.5
>
> 3. summaryBy in doBy package
>
> > library(doBy)
> > summaryBy(. ~ student, basicSub)
>  student score.mean
> 1       1       55.0
> 2       2       60.0
> 3       3       67.5
>
> 4. sqldf in sqldf package.  Uses SQL:
>
> > library(sqldf)
> > sqldf("select student, avg(score) from basicSub group by student")
>  student avg(score)
> 1       1       55.0
> 2       2       60.0
> 3       3       67.5
>
> 5. summary.formula in Hmisc
>
> > summary(score ~ student, basicSub)
> score    N=5
>
> +-------+-+-+-----+
> |       | |N|score|
> +-------+-+-+-----+
> |student|1|2|55.0 |
> |       |2|1|60.0 |
> |       |3|2|67.5 |
> +-------+-+-+-----+
> |Overall| |5|61.0 |
> +-------+-+-+-----+
>
> 6. plyr (see Dennis Murphy's solution in this thread)
>
>
> On Sun, Jan 3, 2010 at 10:46 PM, david hilton shanabrook
> <dhsha...@acad.umass.edu> wrote:
> > I want to use aggregate with the mean function on specific columns
> >
> > gender <- factor(c("m", "m", "f", "f", "m"))
> > student <- c(0001, 0002, 0003, 0003, 0001)
> > score <- c(50, 60, 70, 65, 60)
> > basicSub <- data.frame(student, gender, score)
> > basicSubMean <- aggregate(basicSub, by=list(basicSub$student), FUN=mean,
> na.rm=TRUE)
> >
> > This doesn't work, one cannot take the mean of a factor (gender).  Is
> there any way of specifying which columns to use for the mean?  I want to
> aggregate by student, obtaining mean scores, and assume any other factors
> are unchanging in a specific student, ie. gender.
> >
> > Thanks
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] function in aggregate applied to specific columns only

Reply via email to