Re: [R] Aggregate with non-scalar function

Mike Nielsen Wed, 07 Nov 2007 17:17:05 -0800

Yes, that's exactly it!

Many thanks, it all comes back to me now!  It's the darn do.call that I can
never remember somehow.  I know I've reinvented this wheel several times --
you'd think I'd learn.  Sigh.


Again, my thanks!

Regards,

Mike

On Nov 7, 2007 2:35 PM, jim holtman <[EMAIL PROTECTED]> wrote:

> Is this closer to what you would like?
>
> > x <- textConnection("     hostName user sys idle             date time
> + 10142     fred  0.4 0.5 98.0 2007-11-01 02:02:18
> + 16886   barney  0.5 0.2 94.6 2007-10-25 19:12:12
> + 8795      fred  0.0 0.1 99.8 2007-10-30 05:08:22
> + 5261      fred  0.1 0.2 99.7 2007-10-25 07:20:32
> + 12427   barney  0.1 0.2 93.2 2007-10-19 14:34:10
> + 18067   barney  0.1 0.2 99.4 2007-10-27 10:34:08
> + 973       fred  0.0 0.2 99.8 2007-10-19 08:24:22
> + 5426      fred  0.2 0.3 99.5 2007-10-25 12:50:33
> + 7067      fred  0.1 0.2 99.4 2007-10-27 19:32:27
> + 13159   barney  0.1 0.4 84.3 2007-10-20 14:58:11
> + 17481   barney  1.2 2.0 92.6 2007-10-26 15:02:11
> + 21632   barney  0.1 0.1 99.6 2007-11-01 09:24:09
> + 206       fred 19.4 4.8 53.7 2007-10-18 06:50:34
> + 18151   barney  0.1 0.2 94.9 2007-10-27 13:22:09
> + 10662     fred  0.1 0.2 99.6 2007-11-01 19:22:27
> + 10376     fred  0.0 0.2 99.7 2007-11-01 09:50:24
> + 3630      fred 43.7 7.0 33.0 2007-10-23 00:58:27
> + 1118      fred  0.6 0.4 98.9 2007-10-19 13:14:23
> + 5122      fred  0.1 0.2 99.6 2007-10-25 02:42:21
> + 22117   barney  0.0 0.2 99.4 2007-11-02 01:34:04")
> > x.in <- read.table(x, header=TRUE, as.is=TRUE)
> > x.in$hour <- sapply(strsplit(x.in$time, ":"), '[', 1) # pick off the
> hour
> > x.by <- by(x.in, list(x.in$hour, x.in$hostName), function(.host){
> +     data.frame(hostName=.host$hostName[1], hour=.host$hour[1],
> +       user.mean=mean(.host$user),
> +       sys.mean=mean(.host$sys),
> +       idle.mean=mean(.host$idle),
> +       user.max=max(.host$user),
> +       sys.max=max(.host$sys),
> +       idle.max=max(.host$idle))
> + })
> > do.call('rbind', x.by)
>   hostName hour user.mean sys.mean idle.mean user.max sys.max idle.max
> 1    barney   01      0.00     0.20     99.40      0.0     0.2     99.4
> 2    barney   09      0.10     0.10     99.60      0.1     0.1     99.6
> 3    barney   10      0.10     0.20     99.40      0.1     0.2     99.4
> 4    barney   13      0.10     0.20     94.90      0.1     0.2     94.9
> 5    barney   14      0.10     0.30     88.75      0.1     0.4     93.2
> 6    barney   15      1.20     2.00     92.60      1.2     2.0     92.6
> 7    barney   19      0.50     0.20     94.60      0.5     0.2     94.6
> 8      fred   00     43.70     7.00     33.00     43.7     7.0     33.0
> 9      fred   02      0.25     0.35     98.80      0.4     0.5     99.6
> 10     fred   05      0.00     0.10     99.80      0.0     0.1     99.8
> 11     fred   06     19.40     4.80     53.70     19.4     4.8     53.7
> 12     fred   07      0.10     0.20     99.70      0.1     0.2     99.7
> 13     fred   08      0.00     0.20     99.80      0.0     0.2     99.8
> 14     fred   09      0.00     0.20     99.70      0.0     0.2     99.7
> 15     fred   12      0.20     0.30     99.50      0.2     0.3     99.5
> 16     fred   13      0.60     0.40     98.90      0.6     0.4     98.9
> 17     fred   19      0.10     0.20     99.50      0.1     0.2     99.6
> >
>
>
>  On 11/7/07, Mike Nielsen <[EMAIL PROTECTED]> wrote:
> > R-Helpers,
> >
> > I'm sorry to have to ask this -- I've not used R very much in the last
> > 8 or 10 months, and I've gotten rusty.
> >
> > I have the following (ff2 is a subset of a much, much larger dataset):
> >
> > > ff2
> >      hostName user sys idle             obsTime
> > 10142     fred  0.4 0.5 98.0 2007-11-01 02:02:18
> > 16886   barney  0.5 0.2 94.6 2007-10-25 19:12:12
> > 8795      fred  0.0 0.1 99.8 2007-10-30 05:08:22
> > 5261      fred  0.1 0.2 99.7 2007-10-25 07:20:32
> > 12427   barney  0.1 0.2 93.2 2007-10-19 14:34:10
> > 18067   barney  0.1 0.2 99.4 2007-10-27 10:34:08
> > 973       fred  0.0 0.2 99.8 2007-10-19 08:24:22
> > 5426      fred  0.2 0.3 99.5 2007-10-25 12:50:33
> > 7067      fred  0.1 0.2 99.4 2007-10-27 19:32:27
> > 13159   barney  0.1 0.4 84.3 2007-10-20 14:58:11
> > 17481   barney  1.2 2.0 92.6 2007-10-26 15:02:11
> > 21632   barney  0.1 0.1 99.6 2007-11-01 09:24:09
> > 206       fred 19.4 4.8 53.7 2007-10-18 06:50:34
> > 18151   barney  0.1 0.2 94.9 2007-10-27 13:22:09
> > 10662     fred  0.1 0.2 99.6 2007-11-01 19:22:27
> > 10376     fred  0.0 0.2 99.7 2007-11-01 09:50:24
> > 3630      fred 43.7 7.0 33.0 2007-10-23 00:58:27
> > 1118      fred  0.6 0.4 98.9 2007-10-19 13:14:23
> > 5122      fred  0.1 0.2 99.6 2007-10-25 02:42:21
> > 22117   barney  0.0 0.2 99.4 2007-11-02 01:34:04
> >
> > > doit(ff2)
> >   hostName hour user.mean sys.mean idle.mean user.max sys.max idle.max
> > 1    barney   01      0.00     0.20     99.40      0.0     0.2     99.4
> > 2    barney   09      0.10     0.10     99.60      0.1     0.1     99.6
> > 3    barney   10      0.10     0.20     99.40      0.1     0.2     99.4
> > 4    barney   13      0.10     0.20     94.90      0.1     0.2     94.9
> > 5    barney   14      0.10     0.30     88.75      0.1     0.4     93.2
> > 6    barney   15      1.20     2.00     92.60      1.2     2.0     92.6
> > 7    barney   19      0.50     0.20     94.60      0.5     0.2     94.6
> > 8      fred   00     43.70     7.00     33.00     43.7     7.0     33.0
> > 9      fred   02      0.25     0.35     98.80      0.4     0.5     99.6
> > 10     fred   05      0.00     0.10     99.80      0.0     0.1     99.8
> > 11     fred   06     19.40     4.80     53.70     19.4     4.8     53.7
> > 12     fred   07      0.10     0.20     99.70      0.1     0.2     99.7
> > 13     fred   08      0.00     0.20     99.80      0.0     0.2     99.8
> > 14     fred   09      0.00     0.20     99.70      0.0     0.2     99.7
> > 15     fred   12      0.20     0.30     99.50      0.2     0.3     99.5
> > 16     fred   13      0.60     0.40     98.90      0.6     0.4     98.9
> > 17     fred   19      0.10     0.20     99.50      0.1     0.2     99.6
> > > doit
> > function(x){
> > x.mean<-aggregate(x[,c("user","sys","idle")],
> >                             by=list(hostName=x$hostName,
> >
> > hour=strftime(as.POSIXlt(x$obsTime),"%H")),
> >                             mean)
> >
> > x.max<-aggregate(x[,c("user","sys","idle")],
> >                           by=list(hostName=x$hostName,
> >
> > hour=strftime(as.POSIXlt(x$obsTime),"%H")),
> >                           max)
> >
> > t1<-merge(x.mean,x.max
> ,by=c("hostName","hour"),suffixes=c(".mean",".max"))
> > return(t1)
> > }
> >
> > The point of the "doit" function is to make a new dataframe in which
> > the columns are summary statistics of certain columns in the argument.
> >
> > Is there a function similar to:
> >
> > magic.function(ff2[,c("user","system","idle")],
> >      by=list(hostName=ff2$hostName,hour=strftime(as.POSIXlt
> (ff2$obsTime),"%H")),
> >      function(x){c(mean.user=mean(x$user),
> >                        mean.system=mean(x$system),
> >                        mean.idle=mean(x$idle),
> >                        max.user=max(x$user),
> >                        max.system=max(x$system),
> >                        max.idle=max(x$idle))})
> >
> > ie. an "aggregate" that can cope with a non-scalar function and "do
> > what I mean"?  My doit function gets more and more ugly the more
> > summary statistics I add, and I worry about the "merge" with hundreds
> > of thousands of rows.
> >
> > I'm almost sure I've seen a solution to what I know is a simple
> > problem, but I guess my search skills are as bad as my "R": I've
> > rummaged around the r-help archives and came up with nothing to show
> > for it.
> >
> >
> > Pointers would be gratefully received.
> >
> > Many thanks.
> > --
> > Regards,
> >
> > Mike Nielsen
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
>



-- 
Regards,

Mike Nielsen

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Aggregate with non-scalar function

Reply via email to