On Fri, Nov 5, 2010 at 1:22 PM, thornbird <huachang...@gmail.com> wrote: > > I am new to Using R for data analysis. I have an incomplete time series > dataset that is in daily format. I want to extract only Friday data from it. > However, there are two problems with it. > > First, if Friday data is missing in that week, I need to extract the data of > the day prior to that Friday (e.g. Thursday). > > Second, sometimes there are duplicate Friday data (say Friday morning and > afternoon), but I only need the latest one (Friday afternoon). > > My question is how I can only extract the Friday data and make it a new > dataset so that I have data for every single week for the convenience of > data analysis. >
There are several approaches depending on exactly what is to be produced. We show two of them here using zoo. # read in data Lines <- " views number timestamp day time 1 views 910401 1246192687 Sun 6/28/2009 12:38 2 views 921537 1246278917 Mon 6/29/2009 12:35 3 views 934280 1246365403 Tue 6/30/2009 12:36 4 views 986463 1246888699 Mon 7/6/2009 13:58 5 views 995002 1246970243 Tue 7/7/2009 12:37 6 views 1005211 1247079398 Wed 7/8/2009 18:56 7 views 1011144 1247135553 Thu 7/9/2009 10:32 8 views 1026765 1247308591 Sat 7/11/2009 10:36 9 views 1036856 1247436951 Sun 7/12/2009 22:15 10 views 1040909 1247481564 Mon 7/13/2009 10:39 11 views 1057337 1247568387 Tue 7/14/2009 10:46 12 views 1066999 1247665787 Wed 7/15/2009 13:49 13 views 1077726 1247778752 Thu 7/16/2009 21:12 14 views 1083059 1247845413 Fri 7/17/2009 15:43 15 views 1083059 1247845824 Fri 7/17/2009 18:45 16 views 1089529 1247914194 Sat 7/18/2009 10:49" library(zoo) # read in and create a zoo series # - skip= over the header # - index=. the time index is third non-removed column. # - format=. convert the index to Date class using indicated format # - col.names= as specified # - aggregate= over duplicate dates keeping last # - colClasses= specifies "NULL" for columns we want to remove colClasses <- c("NULL", "NULL", "numeric", "numeric", "NULL", "character", "NULL") col.names <- c(NA, NA, "views", "number", NA, NA, NA) # z <- read.zoo("myfile.dat", skip = 1, index = 3, z <- read.zoo(textConnection(Lines), skip = 1, index = 3, format = "%m/%d/%Y", col.names = col.names, aggregate = function(x) tail(x, 1), colClasses = colClasses) ## Now that we have read it in lets process it ## 1. # extract all Thursdays and Fridays z45 <- z[format(time(z), "%w") %in% 4:5,] # keep last entry in each week # and show result on R console z45[!duplicated(format(time(z45), "%U"), fromLast = TRUE), ] # 2. alternative approach # above approach labels each point as it was originally labelled # so if Thursday is used it gets the date of that Thursday # Another approach is to always label the resulting point as Friday # and also use the last available value even if its not Thursday # create daily grid g <- seq(start(z), end(z), by = "day") # fill in daily grid so Friday is filled in with prior value # if Friday is NA z.filled <- na.locf(z, xout = g) # extract Fridays (including those filled in from previous) # and show result on R console z.filled[format(time(z.filled), "%w") == "5", ] -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.