On 11/05/2012 7:06 AM, Tim Dorscheidt wrote:
Dear R-users,

I have been using R and its core-packages with great satisfaction now for many years, and have 
recently started using the "ccf" function (part of the "stats" package version 
2.16.0), about which I have a question.

The "ccf"-algorithm for calculating the cross-correlation between two time series always calculates the mean 
and standard deviation per time series beforehand, thereby using a constant value for these irrespective of any 
time-lag. Another piece of statistical software that I'm using, a toolbox in Matlab, does this in a fundamentally 
different way. It first "chops off" the parts of the time-series that do not overlap when a time-lag has been 
introduced, and then calculates a new mean and standard deviation to be used for further calculations. This latter 
method has the advantage of always theoretically still being able to obtain a cross-correlation of 1 (or -1), whereas 
the "ccf"-method of R seems to introduce zeros at the non-overlapping parts of the time-series, thereby 
preventing this possibility and producing very different results. Take for instance the two time series: a = {1,3,2} 
and b = {3,2,1}. The query "ccf(a,b)" produces the output {-0.5, -0.5, 0.5}, but I would!
 t!
  hink that
  a time-lag of -1 should produce a cross-correlation here of 1, since the two 
time series will overlap with identical parts {3,2}.

I have attached clean implementations (removing all dependencies) of how the R algorithm 
seems to calculate cross-correlations with time-lag (it produces identical results to 
"ccf"), and how this other method (in Matlab) calculates it (with newly 
calculated means and standard deviation for each time-lag).

Could someone be so kind as to explain to me why the "ccf"-algorithm has this 
specific implementation that seems to, at least for specific situations, produce results 
with artifacts? It is very likely that the R-implementation, as opposed to the 
alternative algorithm described above and in the attachment, has a very good statistical 
explanation, but one that unfortunately is not dawning on me.

I haven't looked at the ccf code (and your attachment didn't make it through), but I would guess from your description that ccf produces positive semi-definite covariances, and the Matlab routine does not. That means that if you use the estimated covariances to compute variances of linear combinations of terms, you may be able to get negative answers from the Matlab routine. Sometimes this matters.

Duncan Murdoch

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to