Thank you for the suggestion Jannis. I ended up using the function 
'running' in package gregmisc.   'rollyapply' was not able to process all 
the data on my computer. Happy holidays to you and the R community 



Jannis <bt_jan...@yahoo.de> 
12/16/2010 06:56 PM

To
josef.kar...@phila.gov
cc
r-help@r-project.org
Subject
Re: [R] moving average with gaps in time series






Hi Josef,

is there any particular reason why you want to use your own function? 
Have a look at the stats functions or special timeseries functions for 
R. I am sure you will find something that calculates an ordinary moving 
average (and a bunch of fancier stuff).

'filter' (stats) for example can be used for moving averages. You could 
also use 'rollmean' from package zoo.

Not sure how to handle the missing values here, though (both ways most 
probably work with NAs). You could try 'rollaply 'from package zoo with 
a function that sums the valid values in the window and returns TRUE in 
case all are valid and FALSE if they are not. Not sure whether this is 
faster than your solution though.

Are you sure you want to compute a moving average with your missing 
values specification? Or do you just want to aggregate your values to 
hourly means, with those values removed where not all values are valid? 
(in that case 'aggregate' would be your function)


HTH
Jannis


josef.kar...@phila.gov schrieb:
> I have a time series with interval of 2.5 minutes, or 24 observations 
per 
> hour.  I am trying to find a 1 hr moving average, looking backward, so 
> that moving average at n = mean(n-23 : n)
>
> The time series has about 1.5 million rows, with occasional gaps due to 
> poor data quality.  I only want to take a 1 hour moving average for 
those 
> periods that are complete, i.e. have 24 observations in the previous 
hour.
>
> The data is in 3 columns
>
> Value   DateTime        interval
>
> For example:
> Value <- rnorm (100, 50, 3)  #my data has 1.5 million rows; using 100 
here 
> for simple example
> DateTime <-  seq(from = 915148800, to=915156150, by =150)  #time steps 
> 1:50 at 150 second intervals
> DateTime [51] <- 915156450 #skip one time step; 
> DateTime[52:100] <- seq(from = 915156600, to =915163800, by = 150) 
#resume 
> time steps of 150 seconds
> 
> x <- cbind (Value, DateTime)
> x <- as.data.frame(x)
> x$DateTime <-as.POSIXct(x$DateTime, origin="1970-01-01", tz="GMT") 
> x1 <- x[-c(1:23), ]   #trimming x to create direct comparison of 
DateTimes 
> in x and x1
> x[,3] <-    difftime(x1[,2], x[,2], units="mins")  #ignore warning 
message
> colnames(x) [3]  <- "interval"
> x[24:nrow(x),3] <- x[1:(nrow(x)-23),3]     #set interval to be the 
number 
> of minutes between n-23 and n. 
>         #57.5 indicates no gaps in the previous hour up to and including 

> n. 
>         #    >57.5 indicates a gap in the previous n-23 rows. 
> x[1:23,3] <- 0/0  #NaN assigned to first 23 rows so as not to take 
average 
> of first hour.
> #as expected, row 51: 73 indicates a gap, i.e. interval > 57.5
>
> index <- which (x [,3] == 57.5)  #which rows have no gaps in previous 
hour
>
> #loop to calculate 1 hour moving average, only for periods which are 
> complete
> for (i in 1:length(index)) {
>     x [index[i],4] <- mean(x[index[i]-23,1] : x[index[i],1])
>   } 
>
> #This loop works on this simple example; but this takes VERY long time 
to 
> run  on x with 1.5 million rows.  Over 1 hour running and still not 
> complete.  I also tried increasing memory.limit to 4095, but still very 
> slow.
> 
> Any suggestions to make this run faster?  I thought about using the lag 
> function but could not get it to work
>
>
> 
>                [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> 



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to