On 11/05/2012 7:06 AM, Tim Dorscheidt wrote:
Dear R-users,
I have been using R and its core-packages with great satisfaction now for many years, and have
recently started using the "ccf" function (part of the "stats" package version
2.16.0), about which I have a question.
The "ccf"-algorithm for calculating the cross-correlation between two time series always calculates the mean
and standard deviation per time series beforehand, thereby using a constant value for these irrespective of any
time-lag. Another piece of statistical software that I'm using, a toolbox in Matlab, does this in a fundamentally
different way. It first "chops off" the parts of the time-series that do not overlap when a time-lag has been
introduced, and then calculates a new mean and standard deviation to be used for further calculations. This latter
method has the advantage of always theoretically still being able to obtain a cross-correlation of 1 (or -1), whereas
the "ccf"-method of R seems to introduce zeros at the non-overlapping parts of the time-series, thereby
preventing this possibility and producing very different results. Take for instance the two time series: a = {1,3,2}
and b = {3,2,1}. The query "ccf(a,b)" produces the output {-0.5, -0.5, 0.5}, but I would!
t!
hink that
a time-lag of -1 should produce a cross-correlation here of 1, since the two
time series will overlap with identical parts {3,2}.
I have attached clean implementations (removing all dependencies) of how the R algorithm
seems to calculate cross-correlations with time-lag (it produces identical results to
"ccf"), and how this other method (in Matlab) calculates it (with newly
calculated means and standard deviation for each time-lag).
Could someone be so kind as to explain to me why the "ccf"-algorithm has this
specific implementation that seems to, at least for specific situations, produce results
with artifacts? It is very likely that the R-implementation, as opposed to the
alternative algorithm described above and in the attachment, has a very good statistical
explanation, but one that unfortunately is not dawning on me.
I haven't looked at the ccf code (and your attachment didn't make it
through), but I would guess from your description that ccf produces
positive semi-definite covariances, and the Matlab routine does not.
That means that if you use the estimated covariances to compute
variances of linear combinations of terms, you may be able to get
negative answers from the Matlab routine. Sometimes this matters.
Duncan Murdoch
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.