On May 5, 2010, at 5:32 PM, utkarshsinghal wrote:

Extending my question further, I want to apply different FUN arguments on three fields and the "by" argument also contains more than one field.
For example:
set.seed(100)
d = data.frame(a=sample(letters[1:2], 20 ,replace = T ),b = sample (3,20,replace=T),c=rpois(20,1),d=rbinom(20,1,0.5),e=rep(c("X","Y"), 10))

Now I want to split by fields "a" and "b", and want to calculate mean(c), sum(d) and "X"%in%e.

Is there any function which can do this and return the output in a dataframe format. For the above example, it should ideally be a 6*5 dataframe.

The split function is often used for such purposes.

?split

> lapply(split(d$c, list(d$a,d$b)), mean)
$a.1
[1] 0.3333333

$b.1
[1] 0

$a.2
[1] 0.25

$b.2
[1] 0.6666667

$a.3
[1] 1.4

$b.3
[1] 0.75

Your third requested function is not a scalar so that might pose problems:

> lapply(split(d$e, list(d$a,d$b)), function(x) { x %in% "X"})
$a.1
[1] FALSE FALSE  TRUE

$b.1
[1] FALSE

$a.2
[1]  TRUE  TRUE  TRUE FALSE

$b.2
[1] TRUE TRUE TRUE

$a.3
[1] FALSE FALSE FALSE FALSE  TRUE

$b.3
[1]  TRUE FALSE  TRUE FALSE

I believe the summaryBy function in the doBy package might be helpful. You might also consider some of the "describe" functions in various package, Hmisc being the one I have familiarity with. Output will probably be a list, but if it has a regular structure, the as.data.frame function may be effective.


Thanks in advance.

Regards,
Utkarsh Singhal



On 11/23/2009 5:14 AM, Gabor Grothendieck wrote:
Try this:


library(doBy)
summaryBy(breaks ~ ., warpbreaks, FUN = c(mean, sum, length))

  wool tension breaks.mean breaks.sum breaks.length
1    A       L    44.55556        401             9
2    A       M    24.00000        216             9
3    A       H    24.55556        221             9
4    B       L    28.22222        254             9
5    B       M    28.77778        259             9
6    B       H    18.77778        169             9

On Mon, Nov 23, 2009 at 3:15 AM, utkarshsinghal
<utkarsh.sing...@global-analytics.com>  wrote:

Hi All,

I am currently doing the following to compute summary statistics of
aggregated data:
a = aggregate(warpbreaks$breaks, warpbreaks[,-1], mean)
b = aggregate(warpbreaks$breaks, warpbreaks[,-1], sum)
c = aggregate(warpbreaks$breaks, warpbreaks[,-1], length)
ans = cbind(a, b[,3], c[,3])

This seems unnecessarily complex to me so I tried

aggregate(warpbreaks$breaks, warpbreaks[,-1], function(z)
c(mean(z),sum(z),length(z)))

but aggregate doesn't allow FUN argument to return a vector.

I tried "by", "tapply" and several other functions as well but the output
needed further modifications to get the same format as "ans" above.

Is there any other function same as aggregate which allow FUN argument to
return vector.

Regards
Utkarsh

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to