Re: [R] summarize dataframe based on multiple cols, not their combinations

arun Wed, 20 Mar 2013 13:49:05 -0700


Hi,
 lst1<- lapply(letters[1:3],function(i) 
{df1<-data.frame(my_df[i],my_df["dat"]); res<-ddply(df1,.(df1[[i]]),function(x) 
c("mean"=mean(x$dat),"n"=nrow(x)));names(res)[1]<-i;res<-res[res[,1]==1,]})


res1<-Reduce(function(...) merge(...,all=TRUE),lst1)
res1[is.na(res1)]<-"*"
 res1
#  mean n a b c
#1   11 3 1 * *
#2   12 3 * * 1
#3   14 3 * 1 *

A.K.



----- Original Message -----
From: Alexander Shenkin <[email protected]>
To: [email protected]
Cc: 
Sent: Wednesday, March 20, 2013 3:57 PM
Subject: [R] summarize dataframe based on multiple cols, not their combinations

Hi folks,

I'm trying to figure out how to get summarized data based on multiple
columns.  However, instead of giving summaries for every combination of
categorical columns, I want it for each value of each categorical column
regardless of the other columns.  I could do this with three different
commands, but i'm wondering if there's a more elegant way that I'm
missing.  Thanks!

allie

> my_df = data.frame(a = c(1,1,1,0,0,0), b=c(0,0,0,1,1,1),
c=c(1,0,1,0,1,0), dat=c(10,11,12,13,14,15))

> my_df
  a b c dat
1 1 0 1  10
2 1 0 0  11
3 1 0 1  12
4 0 1 0  13
5 0 1 1  14
6 0 1 0  15

> # not what I want
> ddply(my_df, .(a,b,c), function(x) c("mean"=mean(x$dat), "n"=nrow(x)))
  a b c mean n
1 0 1 0   14 2
2 0 1 1   14 1
3 1 0 0   11 1
4 1 0 1   11 2

What I want:
  a b c mean n
1 1 * *   11 3
2 * 1 *   14 3
3 * * 1   12 3

where "*" refers to any value of the other columns.

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] summarize dataframe based on multiple cols, not their combinations

Reply via email to