On Fri, Nov 5, 2010 at 8:24 PM, Gabor Grothendieck <ggrothendi...@gmail.com> wrote: > On Fri, Nov 5, 2010 at 1:22 PM, thornbird <huachang...@gmail.com> wrote: >> >> I am new to Using R for data analysis. I have an incomplete time series >> dataset that is in daily format. I want to extract only Friday data from it. >> However, there are two problems with it. >> >> First, if Friday data is missing in that week, I need to extract the data of >> the day prior to that Friday (e.g. Thursday). >> >> Second, sometimes there are duplicate Friday data (say Friday morning and >> afternoon), but I only need the latest one (Friday afternoon). >> >> My question is how I can only extract the Friday data and make it a new >> dataset so that I have data for every single week for the convenience of >> data analysis. >> > > > There are several approaches depending on exactly what is to be > produced. We show two of them here using zoo. > > > # read in data > > Lines <- " views number timestamp day time > 1 views 910401 1246192687 Sun 6/28/2009 12:38 > 2 views 921537 1246278917 Mon 6/29/2009 12:35 > 3 views 934280 1246365403 Tue 6/30/2009 12:36 > 4 views 986463 1246888699 Mon 7/6/2009 13:58 > 5 views 995002 1246970243 Tue 7/7/2009 12:37 > 6 views 1005211 1247079398 Wed 7/8/2009 18:56 > 7 views 1011144 1247135553 Thu 7/9/2009 10:32 > 8 views 1026765 1247308591 Sat 7/11/2009 10:36 > 9 views 1036856 1247436951 Sun 7/12/2009 22:15 > 10 views 1040909 1247481564 Mon 7/13/2009 10:39 > 11 views 1057337 1247568387 Tue 7/14/2009 10:46 > 12 views 1066999 1247665787 Wed 7/15/2009 13:49 > 13 views 1077726 1247778752 Thu 7/16/2009 21:12 > 14 views 1083059 1247845413 Fri 7/17/2009 15:43 > 15 views 1083059 1247845824 Fri 7/17/2009 18:45 > 16 views 1089529 1247914194 Sat 7/18/2009 10:49" > > library(zoo) > > # read in and create a zoo series > # - skip= over the header > # - index=. the time index is third non-removed column. > # - format=. convert the index to Date class using indicated format > # - col.names= as specified > # - aggregate= over duplicate dates keeping last > # - colClasses= specifies "NULL" for columns we want to remove > > colClasses <- > c("NULL", "NULL", "numeric", "numeric", "NULL", "character", "NULL") > > col.names <- c(NA, NA, "views", "number", NA, NA, NA) > > # z <- read.zoo("myfile.dat", skip = 1, index = 3, > z <- read.zoo(textConnection(Lines), skip = 1, index = 3, > format = "%m/%d/%Y", col.names = col.names, > aggregate = function(x) tail(x, 1), colClasses = colClasses) > > ## Now that we have read it in lets process it > > ## 1. > > # extract all Thursdays and Fridays > z45 <- z[format(time(z), "%w") %in% 4:5,] > > # keep last entry in each week > # and show result on R console > z45[!duplicated(format(time(z45), "%U"), fromLast = TRUE), ] > > > # 2. alternative approach > # above approach labels each point as it was originally labelled > # so if Thursday is used it gets the date of that Thursday > # Another approach is to always label the resulting point as Friday > # and also use the last available value even if its not Thursday > > # create daily grid > g <- seq(start(z), end(z), by = "day") > > # fill in daily grid so Friday is filled in with prior value > # if Friday is NA > z.filled <- na.locf(z, xout = g) > > # extract Fridays (including those filled in from previous) > # and show result on R console > z.filled[format(time(z.filled), "%w") == "5", ] >
Note that if the data can span more than one year then "%U" above should be replaced with "%Y-%U" so that weeks in one year are not lumped with weeks in other years. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.