Hi Jean, Thanks for the help. I couldn't quite get the results I needed with the merge command, but I ended up using the following work-around:
Weather <- read.csv("Weather.csv") Weather$diff.time <- abs(.5 - Weather$TimeNumeric) agg <- aggregate(diff.time ~ Date, data = Weather, FUN = which.min) n.obs <- cumsum(rle(as.double(Weather$Date))$lengths) n.obs <- c(0, n.obs[1:(length(n.obs) - 1)]) noon.ind <- agg$diff.time + n.obs subset <- Weather[noon.ind,] Cheers, Sean On Mon, Dec 19, 2011 at 6:03 AM, Jean V Adams <jvad...@usgs.gov> wrote: > > Sean Baumgarten wrote on 12/14/2011 06:38:08 PM: > > > Hello, > > > > I have a data frame with hourly or sub-hourly weather records that span > > several years, and from that data frame I'm trying to select only the > > records taken closest to noon for each day. Here's what I've done so far: > > > > #Add a column to the data frame showing the difference between noon and > the > > observation time (I converted time to a 0-1 scale so 0.5 represents > noon): > > data$Diff_from_noon <- abs(0.5-data$Time) > > > > #Find the minimum value of "Diff_from_noon" for each Date: > > aggregated <- aggregate(Diff_from_noon ~ Date, data, FUN=min) > > > > > > The problem is that the "aggregated" data frame only has two columns: > Date > > and Diff_from_noon. I can't figure out how to get the columns with the > > actual weather variables to carry over from the original data frame. > > > > Any suggestions you have would be much appreciated. > > > > Thanks, > > Sean > > > You don't provide any example data, so I will use data from R datasets, > airquality. After using the aggregate() function to find the minimum Day > for each Month, merge the resulting data frame with the original data frame > to see all the columns corresponding to the selected minimums. > > > aggregated <- aggregate(Day ~ Month, airquality, FUN=min) > > aggregated > Month Day > 1 5 1 > 2 6 1 > 3 7 1 > 4 8 1 > 5 9 1 > > merge(aggregated, airquality) > Month Day Ozone Solar.R Wind Temp > 1 5 1 41 190 7.4 67 > 2 6 1 NA 286 8.6 78 > 3 7 1 135 269 4.1 84 > 4 8 1 39 83 6.9 81 > 5 9 1 96 167 6.9 91 > > For your data, the code would look like this: > aggregated <- aggregate(Diff_from_noon ~ Date, data, FUN=min) > merge(aggregated, data) > > I recommend that you use a name other than "data" for your data frame, > since data() is a built in R function. > > Jean [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.