Here is a sample of what I'm trying to do: structure(list(C_lo = c(0.00392581816943354, 0.00901222644518829, 0.00484396253385175, 0.00822377400482716, 0.00780070460187192, 0.00952688235337435), C_hi = c(0.00697755827622381, 0.0123301031600017, 0.0113207627868435, 0.0112887993422598, 0.018567245397701, 0.0195253894885054 ), house = c(1, 1, 1, 1, 1, 1), date = c(719, 1027, 1027, 1027, 1030, 1030), hour = c(18, 8, 8, 8, 11, 11), .Names = c("1000", "10000", "10001", "10002", "10003", "10004"), press = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c("1000", "10000", "10001", "10002", "10003", "10004"), .Label = c("DEPR", "PRESS"), class = "factor")), .Names = c("C_lo", "C_hi", "house", "date", "hour", "number", "press" ), class = "data.frame", row.names = c("1000", "10000", "10001", "10002", "10003", "10004"))
I'd like to aggregate the data by the date. I'd like to have a table with the median C_lo and C_hi values grouped by date. I'd also like to plot these points with date on the x-axis, C on y-axis, and lines going through these medians. For plyr, would it be something like: ddply(results, .(date),median, na.rm=T) I tried making a for loop to get the medians, but that doesn't work either. splitresults = split (results, results$date, drop=T) mediann <- matrix (,seq_along(splitresults),2) for (i in seq_along(splitresults)) { piece <- splitresults[[i]] mediann [i,1] <- unique(piece$date) mediann [i,2] <- median (piece$n, na.rm=T) } Jeff ---------------------------------------- > Date: Fri, 5 Aug 2011 11:59:37 -0700 > Subject: Re: [R] Aggregating data > From: djmu...@gmail.com > To: johjeff...@hotmail.com > CC: r-help@r-project.org > > Hi: > > This is the type of problem at which the plyr package excels. Write a > utility function that produces the plot you want using a data frame as > its input argument, and then do something like > > library('plyr') > d_ply(results, .(a, b, c), plotfun) > > where plotfun is a placeholder for the name of the name of your plot > function. The d in d_ply means to take a data frame as input and _ > means return nothing. This is used in particular when a side effect, > such as a plot, is the desired 'output'. See > http://www.jstatsoft.org/v40/i01, which contains an example (baseball) > where groupwise plots are produced. (Don't actually run the example > unless you're willing to wait for 1100+ ggplots to be rendered :) > > If memory serves, you should also be able to produce graphics for each > data subset using the data.table package as well. > > If you want a more concrete solution, provide a more concrete example. > > HTH, > Dennis > > On Fri, Aug 5, 2011 at 9:55 AM, Jeffrey Joh <johjeff...@hotmail.com> wrote: > > > > > > I aggregated my data: aggresults <-aggregate(results, by=list(results$a, > > results$b, results$c), FUN=mean, na.rm=TRUE) > > > > > > > > results has about 8000 lines of data, and aggresults has about 80 lines. I > > would like to create a separate variable for each of the 80 aggregates, > > each containing the 100 lines that were aggregated. I would also like to > > create plots for each of those 80 datasets. > > > > > > > > Is there a way of automating this, so that I don't have to do each of the > > 80 aggregates individually? > > > > > > > > Jeff > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.