I'm not completely sure what your test setup does, but wouldn't it be simpler to take a moving average at every point, but set the window dynamically to be over the previous hour?

In pseudocode, for the j-th sample:

meanval[j] <- mean(Value[ DateTime[(DateTime-DateTime[j])>0 & (Datetime-DateTime[j]<3600)] ] )

where I'm just letting DateTime values be in seconds.

If you really want to reject all averages that would have 'looked back' over a missing sample, I'd play with rle() to find the bad points and use that to reduce the meanval vector I described above.

Carl


********quote****
I have a time series with interval of 2.5 minutes, or 24 observations per hour. I am trying to find a 1 hr moving average, looking backward, so that moving average at n = mean(n-23 : n)

The time series has about 1.5 million rows, with occasional gaps due to poor data quality. I only want to take a 1 hour moving average for those periods that are complete, i.e. have 24 observations in the previous hour.

The data is in 3 columns

Value DateTime interval

For example:
Value <- rnorm (100, 50, 3) #my data has 1.5 million rows; using 100 here for simple example DateTime <- seq(from = 915148800, to=915156150, by =150) #time steps 1:50 at 150 second intervals DateTime [51] <- 915156450 #skip one time step; DateTime[52:100] <- seq(from = 915156600, to =915163800, by = 150) #resume time steps of 150 seconds

x <- cbind (Value, DateTime)
x <- as.data.frame(x)
x$DateTime <-as.POSIXct(x$DateTime, origin="1970-01-01", tz="GMT") x1 <- x[-c(1:23), ] #trimming x to create direct comparison of DateTimes in x and x1 x[,3] <- difftime(x1[,2], x[,2], units="mins") #ignore warning message colnames(x) [3] <- "interval" x[24:nrow(x),3] <- x[1:(nrow(x)-23),3] #set interval to be the number of minutes between n-23 and n.

        #57.5 indicates no gaps in the previous hour up to and including n.

# >57.5 indicates a gap in the previous n-23 rows.
x[1:23,3] <- 0/0 #NaN assigned to first 23 rows so as not to take average of first hour.
#as expected, row 51: 73 indicates a gap, i.e. interval > 57.5

index <- which (x [,3] == 57.5) #which rows have no gaps in previous hour

#loop to calculate 1 hour moving average, only for periods which are complete
for (i in 1:length(index)) {

    x [index[i],4] <- mean(x[index[i]-23,1] : x[index[i],1])   }

#This loop works on this simple example; but this takes VERY long time to run on x with 1.5 million rows. Over 1 hour running and still not complete. I also tried increasing memory.limit to 4095, but still very slow.

Any suggestions to make this run faster? I thought about using the lag function but could not get it to work

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to