On May 5, 2010, at 5:32 PM, utkarshsinghal wrote:
Extending my question further, I want to apply different FUN
arguments on three fields and the "by" argument also contains more
than one field.
For example:
set.seed(100)
d = data.frame(a=sample(letters[1:2],
20
,replace
=
T
),b
=
sample
(3,20,replace=T),c=rpois(20,1),d=rbinom(20,1,0.5),e=rep(c("X","Y"),
10))
Now I want to split by fields "a" and "b", and want to calculate
mean(c), sum(d) and "X"%in%e.
Is there any function which can do this and return the output in a
dataframe format. For the above example, it should ideally be a 6*5
dataframe.
The split function is often used for such purposes.
?split
> lapply(split(d$c, list(d$a,d$b)), mean)
$a.1
[1] 0.3333333
$b.1
[1] 0
$a.2
[1] 0.25
$b.2
[1] 0.6666667
$a.3
[1] 1.4
$b.3
[1] 0.75
Your third requested function is not a scalar so that might pose
problems:
> lapply(split(d$e, list(d$a,d$b)), function(x) { x %in% "X"})
$a.1
[1] FALSE FALSE TRUE
$b.1
[1] FALSE
$a.2
[1] TRUE TRUE TRUE FALSE
$b.2
[1] TRUE TRUE TRUE
$a.3
[1] FALSE FALSE FALSE FALSE TRUE
$b.3
[1] TRUE FALSE TRUE FALSE
I believe the summaryBy function in the doBy package might be helpful.
You might also consider some of the "describe" functions in various
package, Hmisc being the one I have familiarity with. Output will
probably be a list, but if it has a regular structure, the
as.data.frame function may be effective.
Thanks in advance.
Regards,
Utkarsh Singhal
On 11/23/2009 5:14 AM, Gabor Grothendieck wrote:
Try this:
library(doBy)
summaryBy(breaks ~ ., warpbreaks, FUN = c(mean, sum, length))
wool tension breaks.mean breaks.sum breaks.length
1 A L 44.55556 401 9
2 A M 24.00000 216 9
3 A H 24.55556 221 9
4 B L 28.22222 254 9
5 B M 28.77778 259 9
6 B H 18.77778 169 9
On Mon, Nov 23, 2009 at 3:15 AM, utkarshsinghal
<utkarsh.sing...@global-analytics.com> wrote:
Hi All,
I am currently doing the following to compute summary statistics of
aggregated data:
a = aggregate(warpbreaks$breaks, warpbreaks[,-1], mean)
b = aggregate(warpbreaks$breaks, warpbreaks[,-1], sum)
c = aggregate(warpbreaks$breaks, warpbreaks[,-1], length)
ans = cbind(a, b[,3], c[,3])
This seems unnecessarily complex to me so I tried
aggregate(warpbreaks$breaks, warpbreaks[,-1], function(z)
c(mean(z),sum(z),length(z)))
but aggregate doesn't allow FUN argument to return a vector.
I tried "by", "tapply" and several other functions as well but the
output
needed further modifications to get the same format as "ans" above.
Is there any other function same as aggregate which allow FUN
argument to
return vector.
Regards
Utkarsh
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.