Hi: David evidently left the ddply() part to me :)
Here's one way to summarize the data and get a plot in ggplot2. Firstly, thank you for the dput(); you score extra points for that :) I put that output in an object named results. ## Step 1: Summarize the data in plyr library('ggplot2') # also loads plyr and reshape in the process # (1a) Compute the group medians; this is your 'table' # summarise simply returns the summaries and # grouping variable resumm <- ddply(results, .(date), summarise, C_lo = median(C_lo, na.rm = TRUE), C_hi = median(C_hi, na.rm = TRUE)) # (1a) Add the medians to the existing data frame # This is the purpose of transform (as a # substitute for summarise in the call) resumm2 <- ddply(results, .(date), transform, C_lo = median(C_lo, na.rm = TRUE), C_hi = median(C_hi, na.rm = TRUE)) ## Aside: It's not a coincidence that the names of the ## median variables are the same in resumm and resumm2. ## This is by design, so that we can generate a 'nice' ## legend below. # Melt the two summary data frames so that the # lo/hi variables become merged into a factor with # a corresponding value variable resmelt <- melt(results, measure.vars = c('C_lo', 'C_hi')) resumMelt <- melt(resumm, id.vars = 'date') ## Two plots are now generated. The only real difference ## between them is that the former treats date as ## numeric and the latter treats it as a factor. The lines are ## plotted first so that the points are not obscured. ggplot(data = resmelt, aes(x = date)) + geom_line(data = resumMelt, aes(y = value, group = variable, colour = variable), size = 1) + geom_point(aes(y = value, colour = variable), size = 2.5) + labs(x = 'Date', y = 'C', colour = 'Level') + scale_colour_manual('variable', breaks = c('C_lo', 'C_hi'), values = c('blue', 'red'), labels = c('Low', 'High')) ggplot(data = resmelt, aes(x = factor(date))) + geom_line(data = resumMelt, aes(x = factor(date), y = value, group = variable, colour = variable), size = 1) + geom_point(aes(y = value, colour = variable), size = 2.5) + labs(x = 'Date', y = 'C', colour = 'Level') + scale_colour_manual(breaks = levels(resmelt$variable), values = c('blue', 'red'), labels = c('Low', 'High')) The manual scale gives one the option to define one's own set of colors rather than the defaults supplied by ggplot2. In this case I chose to reset the legend labels, but if C_lo and C_hi are what you want, remove the two lines with labels = ... HTH, Dennis On Mon, Aug 8, 2011 at 4:51 PM, Jeffrey Joh <johjeff...@hotmail.com> wrote: > > Here is a sample of what I'm trying to do: > > structure(list(C_lo = c(0.00392581816943354, 0.00901222644518829, > 0.00484396253385175, 0.00822377400482716, 0.00780070460187192, > 0.00952688235337435), C_hi = c(0.00697755827622381, 0.0123301031600017, > 0.0113207627868435, 0.0112887993422598, 0.018567245397701, 0.0195253894885054 > ), house = c(1, 1, 1, 1, 1, 1), date = c(719, 1027, 1027, > 1027, 1030, 1030), hour = c(18, 8, 8, 8, 11, 11), .Names = c("1000", > "10000", > "10001", "10002", "10003", "10004"), press = structure(c(1L, > 1L, 1L, 1L, 1L, 1L), .Names = c("1000", "10000", > "10001", "10002", "10003", "10004"), .Label = c("DEPR", > "PRESS"), class = "factor")), .Names = c("C_lo", "C_hi", > "house", "date", "hour", "number", "press" > ), class = "data.frame", row.names = c("1000", "10000", > "10001", "10002", "10003", "10004")) > > > > I'd like to aggregate the data by the date. I'd like to have a table with > the median C_lo and C_hi values grouped by date. > I'd also like to plot these points with date on the x-axis, C on y-axis, and > lines going through these medians. > > > > For plyr, would it be something like: ddply(results, .(date),median, na.rm=T) > > > > I tried making a for loop to get the medians, but that doesn't work either. > splitresults = split (results, results$date, drop=T) > mediann <- matrix (,seq_along(splitresults),2) > for (i in seq_along(splitresults)) { > piece <- splitresults[[i]] > mediann [i,1] <- unique(piece$date) > mediann [i,2] <- median (piece$n, na.rm=T) > } > > > > Jeff > > > > ---------------------------------------- >> Date: Fri, 5 Aug 2011 11:59:37 -0700 >> Subject: Re: [R] Aggregating data >> From: djmu...@gmail.com >> To: johjeff...@hotmail.com >> CC: r-help@r-project.org >> >> Hi: >> >> This is the type of problem at which the plyr package excels. Write a >> utility function that produces the plot you want using a data frame as >> its input argument, and then do something like >> >> library('plyr') >> d_ply(results, .(a, b, c), plotfun) >> >> where plotfun is a placeholder for the name of the name of your plot >> function. The d in d_ply means to take a data frame as input and _ >> means return nothing. This is used in particular when a side effect, >> such as a plot, is the desired 'output'. See >> http://www.jstatsoft.org/v40/i01, which contains an example (baseball) >> where groupwise plots are produced. (Don't actually run the example >> unless you're willing to wait for 1100+ ggplots to be rendered :) >> >> If memory serves, you should also be able to produce graphics for each >> data subset using the data.table package as well. >> >> If you want a more concrete solution, provide a more concrete example. >> >> HTH, >> Dennis >> >> On Fri, Aug 5, 2011 at 9:55 AM, Jeffrey Joh <johjeff...@hotmail.com> wrote: >> > >> > >> > I aggregated my data: aggresults <-aggregate(results, by=list(results$a, >> > results$b, results$c), FUN=mean, na.rm=TRUE) >> > >> > >> > >> > results has about 8000 lines of data, and aggresults has about 80 lines. I >> > would like to create a separate variable for each of the 80 aggregates, >> > each containing the 100 lines that were aggregated. I would also like to >> > create plots for each of those 80 datasets. >> > >> > >> > >> > Is there a way of automating this, so that I don't have to do each of the >> > 80 aggregates individually? >> > >> > >> > >> > Jeff >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.