On Mon, Dec 3, 2012 at 8:30 PM, Andrew Freedman <andr...@hotsprings.com.au> wrote: > Hi List, > > I have weekly sales observations for several products drawn via ODBC. > Source data is available at > https://www.dropbox.com/s/78vxae5ic8tnutf/asr.csv. > > This is retail sales data, so will contain seasonality and trend > information. I expect to see 52 or 53 observations per year, each > observation occuring on the same day of the week (Saturday). Ultimately > I'm looking to feed these series into forecasting models for demand > planning. > > The data has issues with internal gaps, so while I've been able to > create a ts that appears to respect the frequency and period, I suspect > that a zoo is going to be a better data container. Unfortunately, I'm > not understanding the use of zoo() to describe frequency/period/deltat. > > In the example below I use sales[,16] (aka $p) as it has several > periods (data between 2004 and 2012). I've tried using frequency=52, =7 > and =1, but get the same result each time; every data point ends up in > cycle 1 and I don't have the periodicity needed to find seasonality. > >> sales <- read.csv("asr.csv") >> library(zoo) > > Attaching package: 'zoo' > > The following object(s) are masked from 'package:base': > > as.Date, as.Date.numeric > >> sales.zoo <- zoo(subset(sales, select=c(2:length(sales))), order.by= > + sales$date_end, frequency = 52) >> sales.zoo.i <- na.approx(sales.zoo) # interpolate internal NA values >> frequency(sales.zoo.i) # 52, which seems right > [1] 52 >> cycle(sales.zoo.i[1:20,16]) # everything is in the same cycle... > 2004-08-14 2004-08-21 2004-08-28 2004-09-04 2004-09-11 2004-09-18 > 1 1 1 1 1 1 > 2004-09-25 2004-10-02 2004-10-09 2004-10-16 2004-10-23 2004-10-30 > 1 1 1 1 1 1 > 2004-11-06 2004-11-13 2004-11-20 2004-11-27 2004-12-04 2004-12-11 > 1 1 1 1 1 1 > 2004-12-18 2004-12-25 2005-01-01 2005-01-08 2005-01-15 2005-01-22 > 1 1 1 1 1 1 > 2005-01-29 2005-02-05 2005-02-12 2005-02-19 2005-02-26 2005-03-05 > 1 1 1 1 1 1 >> > > Doubtless it's some facile error that will make me feel sheepish, but > I've been staring at this for a bit now and just getting nowhere. Any > pointers would be greatly appreciated. >
A complete cycle is always represented by 1 time unit so if you wanted a complete cycle to be a year then you would need to represent time in years and fractions of a year, not as "Date" class. That is how "ts" class works too. Since weeks don't evenly divide years you will have to approximate this in order to have a frequency of 52. There are many ways to do this but below we drop week 00 in 53 week years so that there are 52 weeks in every year: Years with 52 weeks don't have a week 00 so this makes all years 52 weeks. z <- read.zoo("asr.csv", sep = ",", header = TRUE) # drop week "00" z0 <- z[ format(time(z), "%W") != "00" ] t0 <- time(z0) # convert time to year + fraction time(z0) <- as.numeric(format(t0, "%Y")) + (as.numeric(format(t0, "%W")) - 1) / 52 # convert to zooreg class (almost regularly spaced) zr <- as.zooreg(z0) frequency(zr) # 52 head(cycle(zr)) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.